我希望每个组的数字都高于指定的阈值。例如,我希望组1的值大于.25,组2的值大于.5,等等。
set.seed(1234)
group <- c(rep("group 1", 30),
rep("group 2", 30),
rep("group 3", 30),
rep("group 4", 30))
number <- c(runif(30, 0, .5), #group 1 data
runif(30, .25, .75), #group 2 data, etc.
runif(30, .5, 1),
runif(30, .75, 1.25))
d <- data.frame(group = group,
number = number)
threshold <- c(.25, .5, .75, 1)
library(dplyr)
d %>% group_by(group) %>% filter(number >= threshold)
最后一行返回警告:
Warning messages:
1: In number >= threshold :
longer object length is not a multiple of shorter object length
2: In number >= threshold :
longer object length is not a multiple of shorter object length
3: In number >= threshold :
longer object length is not a multiple of shorter object length
4: In number >= threshold :
longer object length is not a multiple of shorter object length
请指教。谢谢!
它之所以返回此警告,是因为它正在将长度为4的阈值向量与每个组进行比较,而不是将第一阈值与第一组进行比较,等等。
set.seed(1234)
group <- c(rep("group 1", 30),
rep("group 2", 30),
rep("group 3", 30),
rep("group 4", 30))
number <- c(runif(30, 0, .5), #group 1 data
runif(30, .25, .75), #group 2 data, etc.
runif(30, .5, 1),
runif(30, .75, 1.25))
d <- data.frame(group = group,
number = number)
threshold <- data.frame(group = c("group 1", "group 2", "group 3", "group 4"),
threshold =c(.25, .5, .75, 1))
library(dplyr)
d %>% left_join(threshold, by = 'group') %>%
filter(number >= threshold)
通过创建查找表并将其联接,我们在d阈值中创建了一个新列,该列为每个组保留正确的值。然后,当我们应用过滤器时,会将每个值与正确的阈值进行比较。通过这种方式,我们甚至不需要group_by
!
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句