Below is the process I have used.
- Order the spreadsheet by the columns from which you want to remove duplicates. This will group together all duplicate rows.
- Create a calculation in an empty field to concatenate all fields that are being duplicated. For example '=CONCATENATE(A1,B1,C1)'. With A,B,C being the columns that contain duplicate data.
- Create another calculation in an empty field to compare each rows concatenated result column with the previous rows concatenated result column. For example '=if(D1=D2,"duplicate","unique")'. With D being the column with the concatenate calculation in it.
This will give you a list of "duplicates" and "uniques" to identify if a row is unique or a duplicate of the previous row. This list can be filtered on "unique" or copied to a new sheet to get a set of unique rows.
Below is an example google spreadsheet to give an idea of the columns required accomplish the task
Hope someone finds this useful
No comments:
Post a Comment