我需要定义操作块-因此,我希望将所有操作归为一个ID,该操作距上一次操作不到30天。如果自上次操作以来已超过30天,那么我将标签加1(因此标签2、3、4 ...)。每个新的ID将从1开始。
数据如下:
dat = data.frame(cbind(
id = c(rep(1,2), rep(16,3), rep(17,24)),
##day_id is the action date in %Y%m%d format - I keep it as numeric but could potentially turn to a date.
day_id = c(20130702, 20130121, 20131028, 20131028, 20130531, 20140513, 20140509,
20140430, 20140417, 20140411, 20140410, 20140404,
20140320, 20140313, 20140305, 20140224, 20140213, 20140131, 20140114,
20130827, 20130820, 20130806, 20130730, 20130723,
20130719, 20130716, 20130620, 20130620, 20130614 ),
###diff is the # of days between actions/day_ids
diff =c(NA,162,NA,0,150,NA,4,9,13,6,1,6,15,7,8,9,11,13,17,140,7,14,
7,7,4,3,26,0,6),
###Just a flag to say whether it's a new id
new_id = c(1,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
))
我已经使用for循环完成了此操作,并设法避免在循环内进行循环(请参见下文),但似乎无法摆脱该外部循环。当然,成千上万的id会变得非常慢。在下面的示例中,“ call_block”是我要复制的内容,但没有for循环。谁能帮我摆脱困境吗?
max_days = 30
r = NULL
for(i in unique(dat$id)){
d = dat$diff[dat$id==i]
w = c(1,which(d>=max_days) , length(d)+1)
w2 = diff(w)
r = c(r,rep(1:(length(w)-1), w2))
}
dat$call_block = r
谢谢!
在此处发布@alexis_laz的答案以结束问题
library(data.table)
f = function(x){
ret = c(1, cumsum((x >= 30)[-1]) + 1)
return(ret = ret)
}
df = data.table(dat)
df2 = df[,list(call_block= f(diff)), by = id]
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句