how can I remove some rows that are partial duplicates in r?

kang yep sng

I have a data frame but it has many overlapping rows. However, I cannot erase it because not all values are equal. data consists of 5 * 18000000 values.

date     station_number    station   latitude   longitude
0101      11428            hansung      0         127.5
0101      11428            hansung      0         127.7
0101      11374           bookmuseum    0         127.3
0101      11380            mokryeon     14        127.9
0101      11380            mokryeon     14        128.1
0101      11388            healthcent   86        126.1

I want to erase row number 2 and row number 5 and all the other rows that have same date, station_number and station name. Or I want to unify rows of same values anyway. I need your help.

Quinten

You could remove rows with duplicated values across multiple columns like this:

df [!duplicated(df[c(1:3)]),]
#>   date station_number    station latitude longitude
#> 1  101          11428    hansung        0     127.5
#> 3  101          11374 bookmuseum        0     127.3
#> 4  101          11380   mokryeon       14     127.9
#> 6  101          11388 healthcent       86     126.1

library(dplyr)
df %>%
  distinct(date, station_number, station, .keep_all = TRUE)
#>   date station_number    station latitude longitude
#> 1  101          11428    hansung        0     127.5
#> 2  101          11374 bookmuseum        0     127.3
#> 3  101          11380   mokryeon       14     127.9
#> 4  101          11388 healthcent       86     126.1

Created on 2022-12-10 with reprex v2.0.2


Data:

df <- read.table(text = 'date     station_number    station   latitude   longitude
0101      11428            hansung      0         127.5
0101      11428            hansung      0         127.7
0101      11374           bookmuseum    0         127.3
0101      11380            mokryeon     14        127.9
0101      11380            mokryeon     14        128.1
0101      11388            healthcent   86        126.1', header = TRUE)

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

How to identify partial duplicates of rows in R

How can I sort words and remove duplicates of variable in R?

How can I remove duplicates by searching on a rows for a range of collumns until the last row with text in VBA?

How can I remove duplicates by searching on a rows for a range of columns until the last row with text in VBA?

How can I remove same words in both columns? Duplicates in both rows

How can I remove duplicates in an array of object?

How can I remove duplicates sorted by traits?

How I can remove multiple repeated rows in R

How can i remove rows by condition (initial letters) in r?

How can I remove rows with inf from my dataframe in R?

How can I remove rows and columns with unwanted 0's r?

Remove duplicates and some rows depends conditions

How do I remove the rows of duplicates based on a "key" column?

How to remove duplicates rows in for loop

How to remove partial duplicates from text file?

Using Mac Excel, how can I delete partial duplicate rows?

How can I conceal partial rows in a flex column using CSS?

How can I remove duplicate rows?

How can I remove rows starting with ' in a variant?

How can I remove duplicates from a list in Scala with pattern matching?

How can I remove duplicates before calling create in Dynamics CRM?

How can I iterate through xml and remove the duplicates? C#

How can I remove duplicates from a vector of custom structs?

How can I remove duplicates and add a new unique file?

How can I remove duplicates in an array without using `uniq`?

How can I remove duplicates from list of list in python quickly?

How can I remove and count duplicates in an array of objects? - Javascript

How can I remove specific duplicates from a multimap?

How can I remove a column from a pandas groupby if there are duplicates?