场景是飙车...有时车手与竞争对手比赛,有时他们只是独自比赛。驾驶员及其技能水平始终是完全随机的。比赛在第12圈结束后,每天进行一次比赛,持续10年。有数百个驱动程序。独立观察员在比赛期间记录了数据,包括驾驶员的速度,但仅记录其中一名驾驶员!因此,数据丢失。这是数据的前6行:
df <- data.frame(
Driver_name = c("Rick", "Julie", "Denver", "Johny", "Cassandra", "Phillip"),
Driver_level = c("A", "C", "D", "A", "B", "B"),
Driver_speed = c(96, 91, 89, 94, 88, 99),
Competitor= c("Yes", "Yes", "Yes", "Yes", "No", "No"),
Comp_name= c("Julie", "Rick", "Johnny", "Denver", "NA", "NA"),
Comp_level= c("B", "B", "D", "A", "NA", "NA"),
Comp_speed= c("???", "???", "???", "???", "NA", "NA"),
Race_day= c(165, 165, 72, 72, 92, 65),
Lap_number= c(9, 9, 12, 12, 8, 4),
Humidity= c(33, 33, 88, 88, 12, 55),
Temperature= c(28, 28, 12, 12, 20, 28)
)
每行代表不同的驱动程序,但我需要填写数据以了解竞争对手的速度!我将手动输入速度,以演示其余数据集需要执行的操作。
df_1 <- data.frame(
Driver_name = c("Rick", "Julie", "Denver", "Johny", "Cassandra", "Phillip"),
Driver_level = c("A", "C", "D", "A", "B", "B"),
Driver_speed = c(96, 91, 89, 94, 88, 99),
Competitor= c("Yes", "Yes", "Yes", "Yes", "No", "No"),
Comp_name= c("Julie", "Rick", "Johnny", "Denver", "NA", "NA"),
Comp_level= c("B", "B", "D", "A", "NA", "NA"),
Comp_speed= c(91, 96, 94, 89, "NA", "NA"),
Race_day= c(165, 165, 72, 72, 92, 65),
Lap_number= c(9, 9, 12, 12, 8, 4),
Humidity= c(33, 33, 88, 88, 12, 55),
Temperature= c(28, 28, 12, 12, 20, 28)
)
这是aleft_join
的理想选择。
您的资料
df <- data.frame(
Driver_name = c("Rick", "Julie", "Denver", "Johny", "Cassandra", "Phillip"),
Driver_level = c("A", "C", "D", "A", "B", "B"),
Driver_speed = c(96, 91, 89, 94, 88, 99),
Competitor= c("Yes", "Yes", "Yes", "Yes", "No", "No"),
Comp_name= c("Julie", "Rick", "Johnny", "Denver", "NA", "NA"),
Comp_level= c("B", "B", "D", "A", "NA", "NA"),
Comp_speed= c("???", "???", "???", "???", "NA", "NA"),
Race_day= c(165, 165, 72, 72, 92, 65),
Lap_number= c(9, 9, 12, 12, 8, 4),
Humidity= c(33, 33, 88, 88, 12, 55),
Temperature= c(28, 28, 12, 12, 20, 28)
)
我们加载dplyr
包裹
#install.packages("dplyr") #if you don't have it
library(dplyr)
让我们摆脱Comp_speed
当前具有“ ???”的列 价值观。
df <- df %>% select(-Comp_speed)
让我们创建一个仅包含名称和速度的数据框,然后即时将Driver_speed重命名为Comp_speed。
df2 <- df %>%
select(Driver_name, Comp_speed = Driver_speed)
现在我们可以left_join
将df
数据框更改为df2
。Comp_name
indf
与Driver_name
in匹配df2
df_updated <- df %>%
left_join(df2, by = c("Comp_name" = "Driver_name"))
#> Warning: Column `Comp_name`/`Driver_name` joining factors with different
#> levels, coercing to character vector
这是结果数据框 df_updated
df_updated
#> Driver_name Driver_level Driver_speed Competitor Comp_name Comp_level
#> 1 Rick A 96 Yes Julie B
#> 2 Julie C 91 Yes Rick B
#> 3 Denver D 89 Yes Johnny D
#> 4 Johny A 94 Yes Denver A
#> 5 Cassandra B 88 No NA NA
#> 6 Phillip B 99 No NA NA
#> Race_day Lap_number Humidity Temperature Comp_speed
#> 1 165 9 33 28 91
#> 2 165 9 33 28 96
#> 3 72 12 88 12 NA
#> 4 72 12 88 12 89
#> 5 92 8 12 20 NA
#> 6 65 4 55 28 NA
随着OP的提出,这对于不止一次赛车的赛车手来说并不牢固(我的疏忽)。
假设(从数据中)Race_day
和Lap_number
变量足以区分每个正面竞争,我们只需将它们保留在df2
数据框中即可。然后在我们的列表中加入这些列名称left_join
。这就是它的样子。
df2 <- df %>%
select(Driver_name, Comp_speed = Driver_speed, Race_day, Lap_number)
df_updated <- df %>%
left_join(df2, by = c("Comp_name" = "Driver_name", "Race_day", "Lap_number"))
#> Warning: Column `Comp_name`/`Driver_name` joining factors with different
#> levels, coercing to character vector
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句