我想对我的数据框进行子集化,这里有一个例子:
groups names col3
group1 Sp1 OK
group1 Sp3 OK
group1 Sp7 OK
group1 Sp3 OK
group2 Sp1 OK
group2 Sp2 OK
group2 Sp3 OK
group3 Sp1 OK
group4 Sp1 OK
group4 Sp2 OK
group4 Sp2 OK
而这个想法是为每个组,只保留那些同时含有Sp1
与Sp2
和删除其他
在这里我应该保持组2 and 4
:
groups names col3
group2 Sp1 OK
group2 Sp2 OK
group2 Sp3 OK
group4 Sp1 OK
group4 Sp2 OK
group4 Sp2 OK
我尝试了类似的东西:
df2=df %>%
group_by(groups) %>%
df$names == "Sp1" & df$names == "Sp2"
但这似乎不是正确的方法。
谢谢你的帮助。
我们可以filter
在执行该group_by
步骤之后使用,并确保该组同时具有“ Sp1”和“ Sp2”,%in%
并且all
library(dplyr)
df %>%
group_by(groups) %>%
filter(all(c("Sp1", "Sp2") %in% names))
# A tibble: 6 x 3
# Groups: groups [2]
# groups names col3
# <chr> <chr> <chr>
#1 group2 Sp1 OK
#2 group2 Sp2 OK
#3 group2 Sp3 OK
#4 group4 Sp1 OK
#5 group4 Sp2 OK
#6 group4 Sp2 OK
或base R
与table
和subset
subset(df, groups %in% names(which(!rowSums(!table(subset(df,
names %in% c("Sp1", "Sp2"), select = 1:2))))))
注意using的问题&
是我们正在检查'Sp1'和'Sp2'是否都在'names'的同一行中,这不太可能发生。相反,逻辑在于是否可以在特定组的“名称”中找到它们两者
df <- structure(list(groups = c("group1", "group1", "group1", "group1",
"group2", "group2", "group2", "group3", "group4", "group4", "group4"
), names = c("Sp1", "Sp3", "Sp7", "Sp3", "Sp1", "Sp2", "Sp3",
"Sp1", "Sp1", "Sp2", "Sp2"), col3 = c("OK", "OK", "OK", "OK",
"OK", "OK", "OK", "OK", "OK", "OK", "OK")),
class = "data.frame", row.names = c(NA,
-11L))
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句