我有两个数据框。在df1
一个看起来像:
Day Element Incident
1 2020-04-06 3101 Check incident by SOILING
2 2020-04-02 3102 Check alarm 5662
3 2020-05-21 3101 Check energy loss by METEO ERROR
4 2020-04-02 3202 Check ACDC grid
另一个df2
是这样的:
Day Element Incident Energy_loss
1 2020-04-06 3101 SOILING 0.05
2 2020-04-14 3101 SOILING 0.01
3 2020-05-21 3101 METEO ERROR 0.11
4 2020-06-15 3102 METEO ERROR 0.03
我想基于列合并它们Day
,Element
和Incident
,所以我需要找到当列Incident
中df1
包含列Incident
的df2
。df1
与之不匹配的行df2
可以Nan
在Energy loss
列中保留a 。
我尝试了通常的合并,但是由于条件之一merge
是通过子字符串进行,因此无法正常工作。
我期望的输出是:
Day Element Incident Energy loss
1 2020-04-06 3101 Check incident by SOILING 0.05
2 2020-04-02 3102 Check alarm 5662 Nan
3 2020-05-21 3101 Check energy loss by METEO ERROR 0.11
4 2020-04-02 3202 Check ACDC grid Nan
我们可以使用 regex_left_join
library(dplyr)
library(fuzzyjoin)
regex_left_join(df1, df2, by = c('Day', 'Element', 'Incident')) %>%
select(Day = Day.x, Element = Element.x, Incident = Incident.x, Energy_loss)
-输出
# Day Element Incident Energy_loss
#1 2020-04-06 3101 Check incident by SOILING 0.05
#2 2020-04-02 3102 Check alarm 5662 NA
#3 2020-05-21 3101 Check energy loss by METEO ERROR 0.11
#4 2020-04-02 3202 Check ACDC grid NA
df1 <- structure(list(Day = c("2020-04-06", "2020-04-02", "2020-05-21",
"2020-04-02"), Element = c(3101L, 3102L, 3101L, 3202L),
Incident = c("Check incident by SOILING",
"Check alarm 5662", "Check energy loss by METEO ERROR", "Check ACDC grid"
)), class = "data.frame", row.names = c("1", "2", "3", "4"))
df2 <- structure(list(Day = c("2020-04-06", "2020-04-14", "2020-05-21",
"2020-06-15"), Element = c(3101L, 3101L, 3101L, 3102L), Incident = c("SOILING",
"SOILING", "METEO ERROR", "METEO ERROR"), Energy_loss = c(0.05,
0.01, 0.11, 0.03)), class = "data.frame", row.names = c("1",
"2", "3", "4"))
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句