Remove duplicate rows, and keep newest row based on date column

Bjarke Mønsted

I have a huge list of data in excel (250.000+ rows) in the following format:

Number  Value1  Date            Value2
40325   1       21/01/11 18.10  2
65485   3       22/01/11 16.47  2
40325   9       25/01/11 19.00  0
70912   8       27/01/11 16.43  2

I need to remove duplicate rows based on column 1 (Number), and have no problem doing this using "Data/Remove Duplicates" in Excel, but I need to make sure that I remove the row with the oldest date, and keep the newest, based on column 3 (Date).

In the example above, I would need to remove row 1 and keep row 3, since row 3 is the newest.

I have 4.800 rows with duplicates, so a manual sorting/removing would be a very time consuming job.

Any good suggestions? And tricks to help me out? Thanks a lot in advance :)

nixda

The trick is to sort your table before using Remove duplicates. Excel always keeps the first data set of a duplicated row. All consecutive rows are removed.

In your case:

  1. Set up a helper column and fill it with numerical values. Start by 1 and use autofill till the end of our table
    enter image description here

  2. Make sure your date column is formatted as date and Excel recognize them as date. Otherwise your sorting wouldn't work

  3. Choose Custom sort (depends on your Excel version). Sort your whole table by date column from Newest to Oldest. That's the important part
    enter image description here

  4. Use Remove duplicates and select only your Number column which holds your criteria to check for duplicates. Deselect all other columns
    enter image description here

  5. Choose Custom Sort again and sort your table by that Helper column we have added at the beginning to get your original row order back
    enter image description here

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Remove duplicate rows in one column based on another column and keep other columns intact

Delete duplicate rows and keep the newest entry

Remove duplicate rows based on composite key and keep the latest row entry mysql php

Sheets Script - Detect duplicate cell in same column and remove newest row with duplicate cell

remove duplicate rows based on one column value

pyspark remove duplicate rows based on column value

Select rows and remove duplicate based on value of a column

SQL - Remove Duplicate Rows Based on a Single Column

remove duplicates but keep the row based on a specific column

Filtering newest row of a column, returning two rows

Mysql remove duplicate rows base on column value but keep the latest one

Remove duplicate rows based on previous rows' values in a specific column

Keep only non-duplicate rows based on a Column Value

keep N number of duplicate rows (based on column 1)

LINQ to remove duplicate rows from a datatable based on the value of a specific row

How to remove overlapping rows based on date and keep most recent in sql?

remove duplicate rows based on the highest value in another column in Pandas df

Concatenate two dataframes and remove duplicate rows based on column value

How to remove duplicate rows from an array based on first column.

Remove duplicate column pairs, sort rows based on 2 columns

Remove duplicate rows from query based on value in one column

Remove duplicate rows based on presence of other column values

Remove duplicate rows based on 2 columns and a condition in a third column

Group Rows Based On Column Value And Keep Row With Minimum Value In R

remove duplicate row based on conditional matching in another column

Remove duplicate rows checking duplicate values in multiple columns and keep the row where no NA values are present

update column "Count" to have the value of its highest duplicate row ID, delete duplicate rows based on the "Url" column

How duplicate a rows in SQL base on difference between date columns and divided aggregated column per duplicate row?

how to remove duplicate records but need to keep their child table rows and tag them to survived row