可以说我有一个像这样的数据集:
origin=data.frame(Date=as.Date(c("2016-08-05","2016-08-04","2016-08-03")),
L=c(1,2,3),
Type=c("H","L","H"))
Date L Type
1 2016-08-05 1 H
2 2016-08-04 2 L
3 2016-08-03 3 H
end=data.frame(Date=as.Date(c("2016-08-05","2016-08-04","2016-08-03","2016-08-02","2016-08-01")),
N=c(50,40,30,20,10),
Name=c("CA","CB","CC","CD","CE"),
Vol=c(2,1,2,2,3),
Act=c(0.1,0.2,0.3,0.2,0.2))
Date N Name Vol Act
1 2016-08-05 50 CA 2 0.1
2 2016-08-04 40 CB 1 0.2
3 2016-08-03 30 CC 2 0.3
4 2016-08-02 20 CD 2 0.2
5 2016-08-01 10 CE 3 0.2
我想要这样的东西:
Date L Type N Name Vol Act
3 2016-08-05 1 H 50 CA 2 0.1
3 2016-08-05 1 H 40 CB 1 0.2
3 2016-08-05 1 H 30 CC 2 0.3
2 2016-08-04 2 L 40 CB 1 0.2
2 2016-08-04 2 L 30 CC 2 0.3
2 2016-08-04 2 L 20 CD 2 0.2
1 2016-08-03 3 H 30 CC 2 0.3
1 2016-08-03 3 H 20 CD 2 0.2
1 2016-08-03 3 H 10 CE 3 0.2
我想保留“起源”的原始列日期,在合并中,我想将其与“结束”的当前日期和以前的日期值(两个先前的值)合并,就像用循环合并一样。在其他帖子中,仅匹配公共值,这将给出3行结果:
merge(x = origin, y = end, by = "Date")
Date L Type N Name Vol Act
1 2016-08-03 3 H 30 CC 2 0.3
2 2016-08-04 2 L 40 CB 1 0.2
3 2016-08-05 1 H 50 CA 2 0.1
这是非常不同的,并且不会按当前行和上一行的值合并两个数据帧,因此我无法弄清楚该如何进行。
看起来像是foverlaps
从data.table
对这份工作西装:
# prepare data and add extra columns for foverlaps join which relies on columns instead of one
library(data.table)
setDT(origin)[, DateStart := Date - 2]
setDT(end)[, DateStart := Date]
setkey(origin, DateStart, Date)
# join two tables with foverlaps and remove subsidiary columns
foverlaps(end, origin, type = "within")[, `:=` (DateStart = NULL, i.Date = NULL, i.DateStart = NULL)][order(Date)]
# Date L Type N Name Vol Act
# 1: 2016-08-03 3 H 30 CC 2 0.3
# 2: 2016-08-03 3 H 20 CD 2 0.2
# 3: 2016-08-03 3 H 10 CE 3 0.2
# 4: 2016-08-04 2 L 40 CB 1 0.2
# 5: 2016-08-04 2 L 30 CC 2 0.3
# 6: 2016-08-04 2 L 20 CD 2 0.2
# 7: 2016-08-05 1 H 50 CA 2 0.1
# 8: 2016-08-05 1 H 40 CB 1 0.2
# 9: 2016-08-05 1 H 30 CC 2 0.3
或使用version的non-equi
加入功能:data.table
1.9.7
setDT(origin)[, `:=` (DateEnd = Date, StartDate = Date - 2)]
[setDT(end), on = .(DateEnd >= Date, StartDate <= Date), allow = T]
# Date L Type DateEnd StartDate N Name Vol Act
# 1: 2016-08-05 1 H 2016-08-05 2016-08-05 50 CA 2 0.1
# 2: 2016-08-04 2 L 2016-08-04 2016-08-04 40 CB 1 0.2
# 3: 2016-08-05 1 H 2016-08-04 2016-08-04 40 CB 1 0.2
# 4: 2016-08-03 3 H 2016-08-03 2016-08-03 30 CC 2 0.3
# 5: 2016-08-04 2 L 2016-08-03 2016-08-03 30 CC 2 0.3
# 6: 2016-08-05 1 H 2016-08-03 2016-08-03 30 CC 2 0.3
# 7: 2016-08-03 3 H 2016-08-02 2016-08-02 20 CD 2 0.2
# 8: 2016-08-04 2 L 2016-08-02 2016-08-02 20 CD 2 0.2
# 9: 2016-08-03 3 H 2016-08-01 2016-08-01 10 CE 3 0.2
删除辅助列应该很简单。
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句