how can I remove some rows that are partial duplicates in r?

kang yep sng

I have a data frame but it has many overlapping rows. However, I cannot erase it because not all values are equal. data consists of 5 * 18000000 values.

date     station_number    station   latitude   longitude
0101      11428            hansung      0         127.5
0101      11428            hansung      0         127.7
0101      11374           bookmuseum    0         127.3
0101      11380            mokryeon     14        127.9
0101      11380            mokryeon     14        128.1
0101      11388            healthcent   86        126.1

I want to erase row number 2 and row number 5 and all the other rows that have same date, station_number and station name. Or I want to unify rows of same values anyway. I need your help.

Quinten

You could remove rows with duplicated values across multiple columns like this:

df [!duplicated(df[c(1:3)]),]
#>   date station_number    station latitude longitude
#> 1  101          11428    hansung        0     127.5
#> 3  101          11374 bookmuseum        0     127.3
#> 4  101          11380   mokryeon       14     127.9
#> 6  101          11388 healthcent       86     126.1

library(dplyr)
df %>%
  distinct(date, station_number, station, .keep_all = TRUE)
#>   date station_number    station latitude longitude
#> 1  101          11428    hansung        0     127.5
#> 2  101          11374 bookmuseum        0     127.3
#> 3  101          11380   mokryeon       14     127.9
#> 4  101          11388 healthcent       86     126.1

^{Created on 2022-12-10 with reprex v2.0.2}

Data:

df <- read.table(text = 'date     station_number    station   latitude   longitude
0101      11428            hansung      0         127.5
0101      11428            hansung      0         127.7
0101      11374           bookmuseum    0         127.3
0101      11380            mokryeon     14        127.9
0101      11380            mokryeon     14        128.1
0101      11388            healthcent   86        126.1', header = TRUE)

Collected from the Internet

Please contact [email protected] to delete if infringement.