在r中按日期匹配和替换数据框中列的值

雷迪左

干杯,我有两个具有以下结构的数据帧。

DF1:
Airlines           HeadQ      Date           Cost_Index
American           PHX        07-31-2016     220
American           ATL        08-31-2016     150
American           ATL        10-31-2016     150
Delta              ATL        10-31-2016     180
American           ATL        08-31-2017     200

第二数据帧DF2具有以下结构:

DF2:
Airlines           HeadQ      Date          
American           ATL        09-30-2016
Delta              ATL        03-31-2017

现在查找数据帧DF1和DF2,我想将DF1更改为以下数据帧。

DF1:
Airlines           HeadQ      Date           Cost_Index
American           PHX        07-31-2016     220
American           ATL        08-31-2016     0
American           ATL        10-31-2016     150
Delta              ATL        10-31-2016     180
American           ATL        08-31-2017     200

条件是,从DF2查找DF1的Airlines和HeadQ,如果DF1 $ Date <DF2 $ Date,则将Cost_Index设置为0,否则继续使用Cost_Index。

我尝试了以下操作,但均未成功:

DF1$Cost_Index <- ifelse(DF1$Airlines == DF2$Airlines & DF1$HeadQ == DF2$HeadQ 
        & DF1$Date < DF2$Date, 0, DF1$Cost_Index)


Warning:
1: In DF1$Airlines == DF2$Airlines : longer object
length is not a multiple of shorter object length". 
2: In<=.default(DF1$Date, DF2$Date) : longer object length is not a
multiple of shorter object length

DF1:
Airlines           HeadQ      Date           Cost_Index
American           PHX        07-31-2016     220
American           ATL        08-31-2016     0
American           ATL        10-31-2016     0
Delta              ATL        10-31-2016     0
American           ATL        08-31-2017     200

谁能指出我正确的方向?

笔记:

str(DF1$Date): Date, format: "2016-10-31"
str(DF2$Date): Date, format: "2016-08-31"
智慧

你可以做一个左连接的DF1,并DF2AirlinesHeadQ列,然后更改的价值Cost_Index基础上,如果Date.x(从日DF1)小于Date.y(日期从DF2)。随着dplyr语法(如果额外的检查Date.yNA按照你的逻辑,如果有在DF2不匹配,则应用于此Cost_Index应保持不变):

library(dplyr)

# convert the Date column to Date class for comparison
df1$Date <- as.Date(df1$Date, "%m-%d-%Y")
df2$Date <- as.Date(df2$Date, "%m-%d-%Y")

df1 %>% 
       left_join(df2, by = c("Airlines", "HeadQ")) %>% 
       mutate(Cost_Index = ifelse(Date.x < Date.y & !is.na(Date.y), 0, Cost_Index), 
              Date = Date.x) %>% 
       select(-Date.x, -Date.y)

#   Airlines HeadQ Cost_Index       Date
# 1 American   PHX        220 2016-07-31
# 2 American   ATL          0 2016-08-31
# 3 American   ATL        150 2016-10-31
# 4    Delta   ATL        180 2016-10-31

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章