嗨,我有两个数据框:
df1 = data.frame(PersonId1=c(1,2,3,4,5,6,7,8,9,10,1),PersonId2=c(11,12,13,14,15,16,17,18,19,20,11),
Played_together = c(1,0,0,1,1,0,0,0,1,0,1),
Event=c(1,1,1,1,2,2,2,2,2,2,2),
Utility=c(20,-2,-5,10,30,2,1,.5,50,-1,60))
df2 = data.frame(PersonId1=c(11,15,9,1),PersonId2=c(1,5,19,11),
Played_together = c(1,1,1,1),
Event=c(1,2,2,2))
df1如下所示:
PersonId1 PersonId2 Played_together Event Utility
1 1 11 1 1 20.0
2 2 12 0 1 -2.0
3 3 13 0 1 -5.0
4 4 14 1 1 10.0
5 5 15 1 2 30.0
6 6 16 0 2 2.0
7 7 17 0 2 1.0
8 8 18 0 2 0.5
9 9 19 1 2 50.0
10 10 20 0 2 -1.0
11 1 11 1 2 60.0
和df2看起来像这样:
PersonId1 PersonId2 Played_together Event
1 11 1 1 1
2 15 5 1 2
3 9 19 1 2
4 1 11 1 2
请注意,df2不只是df1 $ played_together == 1。(例如,在df2中不存在PlayerId1 = 4且PlayerId2 = 14。
还要注意,尽管df2是df1的子集,但个人在df2中出现的顺序是随机的。例如,在第1行的df1中,我们看到事件1的playerid1 = 1和playerId2 =11。但是在第1行的df2中,我们看到了事件1的playerid1 = 11和playerId2 =1。这两种情况是完全相同的,我想从df1到df2查找Utility的值。合并必须针对每个事件进行。最终输出应如下所示:
PersonId1 PersonId2 Played_together Event Utility
1 11 1 1 1 20
2 15 5 1 2 30
3 9 19 1 2 50
4 1 11 1 2 60
我知道R中存在合并功能,但是当查询ID可能显示为随机值时,我不知道该怎么办。如果有人可以帮助我一点,将不胜感激。提前致谢。
这是我为您准备的:
library(dplyr)
rbind(left_join(df2, df1,
by = c("PersonId2" = "PersonId1", "PersonId1" = "PersonId2",
"Played_together" = "Played_together", "Event" = "Event")),
left_join(df2, df1,
by = c("PersonId1" = "PersonId1", "PersonId2" = "PersonId2",
"Played_together" = "Played_together", "Event" = "Event"))) %>%
filter(!is.na(Utility))
基本上,您的数据有时似乎已经失去了personid。我们可以将两个联接绑定在一起,然后过滤出具有实用程序的行NA
。
您的输出如下所示:
PersonId1 PersonId2 Played_together Event Utility
1 11 1 1 1 20
2 15 5 1 2 30
3 9 19 1 2 50
4 1 11 1 2 60
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句