数据表中带有条件的子集

张安格

假设我们有这样的数据:

tmp <- data.table(id1 = c(1,1,1,1,2,2,2,3,3), time=c(1,2,3,4,1,2,3,1,2), user_id=c(1,1,1,1,2,2,2,1,1) )

对于每个user_id样本,我都需要除带有time > 2when的样本外的所有样本id1 == max(id1)

我现在使用以下代码,它给我这样的警告消息:

tmp1 <- tmp[, if (id1 == max(id1)) .SD[time <= 2,] else .SD  , by="user_id"] 

Warning messages:
1: In if (id1 == max(id1)) .SD[time <= 2, ] else .SD :
  the condition has length > 1 and only the first element will be used
2: In if (id1 == max(id1)) .SD[time <= 2, ] else .SD :
  the condition has length > 1 and only the first element will be used

我猜这是由于if else语句的向量化问题。所以我将代码更改为以下内容:

tmp2 <- tmp[, ifelse(id1 == max(id1), .SD[time <= 2,] , .SD)  , by="user_id"]

Error in `[.data.table`(tmp, , ifelse(id1 == max(id1), .SD[time <= 2,  : 
  Supplied 4 items for column 5 of group 1 which has 6 rows. The RHS length must either be 1 (single values are ok) or match the LHS length exactly. If you wish to 'recycle' the RHS please use rep() explicitly to make this intent clear to readers of your code.

如何更正我的代码?

谢谢!

罗纳克·沙

你可以做 :

library(data.table)
tmp[, .SD[!(id1 == max(id1) & time > 2)], user_id]

#   user_id id1 time
#1:       1   1    1
#2:       1   1    2
#3:       1   1    3
#4:       1   1    4
#5:       1   3    1
#6:       1   3    2
#7:       2   2    1
#8:       2   2    2

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章