I have a data frame but it has many overlapping rows. However, I cannot erase it because not all values are equal. data consists of 5 * 18000000 values.
date station_number station latitude longitude
0101 11428 hansung 0 127.5
0101 11428 hansung 0 127.7
0101 11374 bookmuseum 0 127.3
0101 11380 mokryeon 14 127.9
0101 11380 mokryeon 14 128.1
0101 11388 healthcent 86 126.1
I want to erase row number 2 and row number 5 and all the other rows that have same date, station_number and station name. Or I want to unify rows of same values anyway. I need your help.
You could remove rows with duplicated
values across multiple columns like this:
df [!duplicated(df[c(1:3)]),]
#> date station_number station latitude longitude
#> 1 101 11428 hansung 0 127.5
#> 3 101 11374 bookmuseum 0 127.3
#> 4 101 11380 mokryeon 14 127.9
#> 6 101 11388 healthcent 86 126.1
library(dplyr)
df %>%
distinct(date, station_number, station, .keep_all = TRUE)
#> date station_number station latitude longitude
#> 1 101 11428 hansung 0 127.5
#> 2 101 11374 bookmuseum 0 127.3
#> 3 101 11380 mokryeon 14 127.9
#> 4 101 11388 healthcent 86 126.1
Created on 2022-12-10 with reprex v2.0.2
Data:
df <- read.table(text = 'date station_number station latitude longitude
0101 11428 hansung 0 127.5
0101 11428 hansung 0 127.7
0101 11374 bookmuseum 0 127.3
0101 11380 mokryeon 14 127.9
0101 11380 mokryeon 14 128.1
0101 11388 healthcent 86 126.1', header = TRUE)
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments