这与这篇文章有关,通过公共列在R中导入所选列来合并dfs
dataframes
当不是所有人都data frames
具有相同的列/观察项时,我想按df列合并不同项,如果它们在所有情况下都不通用,则改为显示0。
我的数据集:
df <- data.frame(names=c("Obs1", "Obs2", "Obs3", "Obs4", "Obs5"), `S1`=c(1,2,2,0,1), `S2`=c(2,50,40,30,22), `S3`=c( 0,100,135,256,303), `S4`=c(0,10,17,73,74),check.names=FALSE)
df2<- data.frame(names=c("Obs1", "Obs3", "Obs4", "Obs5"), `S1`=c(0,30,40,2), `S2`=c(2,5,6,7))
df3<- data.frame(names=c("Obs1", "Obs2", "Obs3", "Obs4", "Obs5"), `S1`=c(100,300,300,400,200), `S2`=c(3,5,7,8,7))
df4<- data.frame(names=c("Obs1", "Obs2", "Obs3","Obs6"), `S1`=c(110,310,310,210), `S2`=c(30,50,70,70))
我想要的输出:
当我运行此命令时,它仅在所有数据框中使用公共列名/观察值,而忽略某些(但不是全部)中的公共列名/观察值。
dff <- df %>% inner_join(df2 %>% select(names, 'S1_df2' = S1)) %>%
inner_join(df3 %>% select(names, 'S1_df3' = S1)) %>%
inner_join(df4 %>% select(names, 'S1_df4' = S1))
dff
names S1 S2 S3 S4 S1_df2 S1_df3 S1_df4
1 Obs1 1 2 0 0 0 100 110
2 Obs3 2 40 135 17 30 300 310
所需的输出改为:
names S1 S2 S3 S4 S1_df2 S1_df3 S1_df4
1 Obs1 1 2 0 0 0 100 110
2 Obs2 2 50 100 10 0 300 310 # this Obs is not present in df2, therefore add 0
3 Obs3 2 40 135 17 30 300 310
4 Obs4 0 30 256 73 40 400 0 # this Obs is not present in df4, therefore add 0
5 Obs5 1 22 303 74 2 200 0 # this Obs is not present in df4, therefore add 0
6 Obs6 0 0 0 0 0 0 210 # this Obs is not present in df1,2,3,therefore add 0
我们可以改变inner_join
到full_join
,然后replace
在NA
0
library(dplyr)
library(tidyr)
df %>%
full_join(df2 %>%
select(names, 'S1_df2' = S1)) %>%
full_join(df3 %>%
select(names, 'S1_df3' = S1)) %>%
full_join(df4 %>%
select(names, 'S1_df4' = S1)) %>%
mutate(across(S1:S1_df4, replace_na, 0))
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句