我有一个dataframe
像下面这样的:
x <- data.table(Tickers=c("A","A","A","B","B","B","B","D","D","D","D"),
Type=c("put","call","put","call","call","put","call","put","call","put","call"),
Strike=c(35,37.5,37.5,10,11,11,12,40,40,42,42),
Other=sample(20,11))
Tickers Type Strike Other
1: A put 35.0 6
2: A call 37.5 5
3: A put 37.5 13
4: B call 10.0 15
5: B call 11.0 12
6: B put 11.0 4
7: B call 12.0 20
8: D put 40.0 7
9: D call 40.0 11
10: D put 42.0 10
11: D call 42.0 1
我正在尝试分析数据的子集。我想获取的子集是ticker
和strike
相同的数据。但是我也只想在下存在aput
和acall
时获取这些数据type
。以上面的数据为例,我想返回以下结果:
x[c(2,3,5,6,8:11),]
Tickers Type Strike Other
1: A call 37.5 5
2: A put 37.5 13
3: B call 11.0 12
4: B put 11.0 4
5: D put 40.0 7
6: D call 40.0 11
7: D put 42.0 10
8: D call 42.0 1
我不确定执行此操作的最佳方法是什么。我的思考过程是我应该创建另一个列向量,例如
x$id <- paste(x$Tickers,x$Strike,sep="_")
然后使用此向量仅提取存在多个ID的值。
x[x$id %in% x$id[duplicated(x$id)],]
Tickers Type Strike Other id
1: A call 37.5 5 A_37.5
2: A put 37.5 13 A_37.5
3: B call 11.0 12 B_11
4: B put 11.0 4 B_11
5: D put 40.0 7 D_40
6: D call 40.0 11 D_40
7: D put 42.0 10 D_42
8: D call 42.0 1 D_42
我不确定这样做的效率如何,因为我的实际数据包含更多行。同样,该解决方案不检查type
存在一个put
和一个的条件call
。
标题的措词可能会好很多,我很抱歉
编辑:::已签出该帖子查找所有重复的行,包括“带有较小下标的元素”
我也可以使用以下解决方案:
x$id <- paste(x$Tickers,x$Strike,sep="_")
x[duplicated(x$id) | duplicated(x$id,fromLast=T),]
您可以尝试类似:
x[, select := (.N >= 2 & all(c("put", "call") %in% unique(Type))), by = .(Tickers, Strike)][which(select)]
# Tickers Type Strike Other select
#1: A call 37.5 17 TRUE
#2: A put 37.5 16 TRUE
#3: B call 11.0 11 TRUE
#4: B put 11.0 20 TRUE
#5: D put 40.0 1 TRUE
#6: D call 40.0 12 TRUE
#7: D put 42.0 6 TRUE
#8: D call 42.0 2 TRUE
另一个想法可能是合并:
x[x, on = .(Tickers, Strike), select := (length(Type) >= 2 & all(c("put", "call") %in% Type)),by = .EACHI][which(select)]
我不确定如何解决分组方式,因为您要确保每个分组都具有“通话”和“发出”权限。我当时正在考虑使用键,但是还不能合并“调用” /“放置”方面。
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句