Wednesday 8 September 2010

[Tip] Find duplicate rows in spreadsheet

A number of times in the past I have helped users to remove duplicates rows from excel or calc spreadsheets.

Below is the process I have used.

  1. Order the spreadsheet by the columns from which you want to remove duplicates. This will group together all duplicate rows.
  2. Create a calculation in an empty field to concatenate all fields that are being duplicated. For example '=CONCATENATE(A1,B1,C1)'. With A,B,C being the columns that contain duplicate data.
  3. Create another calculation in an empty field to compare each rows concatenated result column with the previous rows concatenated result column. For example '=if(D1=D2,"duplicate","unique")'. With D being the column with the concatenate calculation in it.
This will give you a list of "duplicates" and "uniques" to identify if a row is unique or a duplicate of the previous row. This list can be filtered on "unique" or copied to a new sheet to get a set of unique rows.

Below is an example google spreadsheet to give an idea of the columns required accomplish the task



Hope someone finds this useful

No comments: