R-在满足数据框之间的某些条件的情况下替换列值

布雷特劳森

我有两个看起来像这样的数据框:

> df_X
            X1 X2 X3 X5
1   2017-05-02 21  1 NA
2   2017-05-02 10  1 NA
3   2017-05-02  5  1 NA
4   2017-05-02  5  1 NA
5   2017-05-02 25  1 NA
6   2017-05-02 20  1 NA
7   2017-05-04 25  1 NA
8   2017-05-08 25  1 NA
9   2017-05-10 20  1 NA
10  2017-05-13 30  1 NA
11  2017-05-17  5  1 NA
12  2017-05-20 25  1 NA
13  2017-05-23 15  1 NA
14  2017-05-25 20  1 NA
15  2017-05-27 35  1 NA
16  2017-05-30 30  1 NA
17  2017-06-01 10  1 NA
18  2017-06-03 10  1 NA
19  2017-06-05  5  1 NA
20  2017-06-08 10  1 NA

> df_Y
   Y1        Y2         Y3         Y4
1   1 0.6656641 2017-08-01 2017-08-02
2   1 1.0285626 2017-08-17 2017-08-18
3   1 1.1905584 2017-08-21 2017-08-22
4   1 1.1026008 2017-08-24 2017-08-25
5   1 1.1570583 2017-08-28 2017-08-29
6   1 1.0676820 2017-08-31 2017-09-01
7   2 1.1003816 2017-08-31 2017-09-01
8   2 1.1901956 2017-09-04 2017-09-05
9   1 1.1636756 2017-09-05 2017-09-05
10  1 1.1724968 2017-09-07 2017-09-07
11  2 1.1335978 2017-09-08 2017-09-09
12  1 1.0873912 2017-09-08 2017-09-08
13  1 1.1238168 2017-09-11 2017-09-11
14  2 1.1564612 2017-09-12 2017-09-13
15  2 1.1688628 2017-09-14 2017-09-15
16  1 1.2326152 2017-09-14 2017-09-15
17  2 1.2014378 2017-09-16 2017-09-16
18  2 1.1911753 2017-09-19 2017-09-19
19  1 1.1445341 2017-09-19 2017-09-20
20  1 1.1299295 2017-09-21 2017-09-22

我想替换的值X5与值Y2,其中X1介于Y3Y4,和X3比赛Y1

使用循环很容易实现:

for (i in 1:NROW(df_Y)) {
  df_X$X5[(df_X$X1 >= df_Y$Y3[i]) &
            (df_X$X1 <= df_Y$Y4[i]) & (df_X$X3 == df_Y$Y1[i])] <- df_Y$Y2[i]
}

> df_X_result
            X1 X2 X3        X5
1   2017-05-02 21  1        NA
            ...
26  2017-08-13 11  1        NA
27  2017-08-18 10  1 1.0285626
28  2017-08-21  5  1 1.1905584
29  2017-08-23  5  1        NA
30  2017-08-25 20  1 1.1026008
31  2017-08-28 10  1 1.1570583
32  2017-08-30  3  1        NA
33  2017-08-31 10  1 1.0676820
34  2017-09-04 18  1        NA
35  2017-09-05 25  1 1.1636756
36  2017-09-07 30  1 1.1724968
37  2017-09-08 20  1 1.0873912
38  2017-09-11 20  1 1.1238168
39  2017-09-13  5  1        NA
40  2017-09-14 25  1 1.2326152
41  2017-09-15 30  1 1.2326152

该循环需要相当长的时间才能完成整个数据帧,因此我想知道是否存在矢量化解决方案,可能使用group_by()

df_X <- structure(list(X1 = c("2017-05-02", "2017-05-02", "2017-05-02", 
"2017-05-02", "2017-05-02", "2017-05-02", "2017-05-04", "2017-05-08", 
"2017-05-10", "2017-05-13", "2017-05-17", "2017-05-20", "2017-05-23", 
"2017-05-25", "2017-05-27", "2017-05-30", "2017-06-01", "2017-06-03", 
"2017-06-05", "2017-06-08", "2017-06-10", "2017-06-14", "2017-06-16", 
"2017-07-15", "2017-08-09", "2017-08-13", "2017-08-18", "2017-08-21", 
"2017-08-23", "2017-08-25", "2017-08-28", "2017-08-30", "2017-08-31", 
"2017-09-04", "2017-09-05", "2017-09-07", "2017-09-08", "2017-09-11", 
"2017-09-13", "2017-09-14", "2017-09-15", "2017-09-18", "2017-09-19", 
"2017-09-21", "2017-09-23", "2017-09-25", "2017-09-26", "2017-09-28", 
"2017-09-29", "2017-09-30", "2017-10-02", "2017-10-03", "2017-10-04", 
"2017-10-05", "2017-10-06", "2017-10-09", "2017-10-10", "2017-10-11", 
"2017-10-12", "2017-10-12", "2017-10-14", "2017-10-16", "2017-10-18", 
"2017-10-20", "2017-10-21", "2017-10-24", "2017-10-27", "2017-10-28", 
"2017-10-31", "2017-11-02", "2017-11-03", "2017-11-06", "2017-11-08", 
"2017-05-02", "2017-05-02", "2017-05-02", "2017-05-02", "2017-05-02", 
"2017-05-03", "2017-05-05", "2017-05-09", "2017-05-12", "2017-05-16", 
"2017-05-19", "2017-05-22", "2017-05-25", "2017-05-27", "2017-05-29", 
"2017-05-31", "2017-06-02", "2017-06-05", "2017-06-07", "2017-06-14", 
"2017-06-16", "2017-06-19", "2017-09-01", "2017-09-05", "2017-09-07", 
"2017-09-08", "2017-09-11", "2017-09-12", "2017-09-14", "2017-09-15", 
"2017-09-18", "2017-09-19", "2017-09-21", "2017-09-22", "2017-09-25", 
"2017-09-27", "2017-09-29", "2017-09-30", "2017-10-02", "2017-10-03", 
"2017-10-04", "2017-10-05", "2017-10-06", "2017-10-09", "2017-10-09", 
"2017-10-11", "2017-10-11", "2017-10-12", "2017-10-14", "2017-10-16", 
"2017-10-17", "2017-10-19", "2017-10-20", "2017-10-21", "2017-10-25", 
"2017-10-27", "2017-10-31", "2017-11-01", "2017-11-02", "2017-11-03", 
"2017-11-06", "2017-11-08"), X2 = c(21, 10, 5, 5, 25, 20, 25, 
25, 20, 30, 5, 25, 15, 20, 35, 30, 10, 10, 5, 10, 5, 10, 15, 
7, 23, 11, 10, 5, 5, 20, 10, 3, 10, 18, 25, 30, 20, 20, 5, 25, 
30, 15, 10, 20, 29, 25, 30, 10, 10, 5, 5, 30, 15, 25, 25, 5, 
30, 5, 10, 3, 20, 55, 10, 15, 30, 20, 30, 10, 20, 25, 25, 15, 
20, 35, 20, 10, 15, 10, 15, 5, 5, 5, 25, 20, 5, 15, 30, 20, 20, 
10, 20, 5, 20, 15, 5, 5, 5, 25, 25, 10, 10, 5, 5, 30, 25, 10, 
5, 20, 10, 30, 20, 25, 10, 20, 15, 10, 5, 30, 5, 10, 30, 3, 10, 
10, 40, 20, 24, 30, 40, 15, 10, 10, 5, 15, 30), X3 = c(1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 2), X5 = c(NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), .Names = c("X1", 
"X2", "X3", "X5"), row.names = c(NA, 135L), class = "data.frame")

df_Y <- structure(list(Y1 = c("1", "1", "1", "1", "1", "1", "2", "2", 
"1", "1", "2", "1", "1", "2", "2", "1", "2", "2", "1", "1", "2", 
"1", "1", "2", "2", "1", "2", "2", "1", "1", "2", "1", "1", "2", 
"2", "1", "2", "2", "2", "1", "1", "1", "2", "2", "1", "1", "2", 
"1", "1", "2", "2", "2", "2", "2", "2", "2"), Y2 = c(0.665664077829585, 
1.02856261889775, 1.19055836913068, 1.10260084679125, 1.15705828895831, 
1.06768199117752, 1.10038159049195, 1.19019555343219, 1.16367557830325, 
1.17249680795712, 1.13359781336505, 1.08739115948259, 1.12381677088993, 
1.15646116283926, 1.16886280825931, 1.23261515446765, 1.20143782448638, 
1.19117526432255, 1.14453414539176, 1.12992954478391, 1.00942359172597, 
1.36304817720811, 0.977737663980795, 0.803038132093719, 0.658557316689605, 
1.08897993760763, 1.02561012617194, 0.953197521830857, 1.10932969513697, 
1.14650953305854, 1.08216085976536, 1.04156195299253, 0.945898621136812, 
0.949689535186447, 1.00475292030475, 1.1143173236383, 1.00267210653824, 
0.887215247393604, 0.974738456110934, 1.05929723094296, 1.02576829587045, 
1.02233864565999, 0.984999990038355, 1.06285567360253, 1.04974288902827, 
1.0846639113405, 1.11423291175236, 0.851355064479879, 0.749599945595823, 
0.935991720991809, 0.895595684187917, 0.9096390430551, 1.07024512065878, 
0.999910689243934, 0, NA), Y3 = c("2017-08-01", "2017-08-17", 
"2017-08-21", "2017-08-24", "2017-08-28", "2017-08-31", "2017-08-31", 
"2017-09-04", "2017-09-05", "2017-09-07", "2017-09-08", "2017-09-08", 
"2017-09-11", "2017-09-12", "2017-09-14", "2017-09-14", "2017-09-16", 
"2017-09-19", "2017-09-19", "2017-09-21", "2017-09-21", "2017-09-22", 
"2017-09-25", "2017-09-25", "2017-09-27", "2017-09-27", "2017-09-28", 
"2017-09-30", "2017-09-30", "2017-10-02", "2017-10-03", "2017-10-02", 
"2017-10-04", "2017-10-05", "2017-10-06", "2017-10-06", "2017-10-07", 
"2017-10-09", "2017-10-10", "2017-10-07", "2017-10-10", "2017-10-11", 
"2017-10-11", "2017-10-12", "2017-10-14", "2017-10-16", "2017-10-16", 
"2017-10-17", "2017-10-19", "2017-10-18", "2017-10-20", "2017-10-20", 
"2017-10-21", "2017-10-25", "2017-11-02", "2017-11-06"), Y4 = c("2017-08-02", 
"2017-08-18", "2017-08-22", "2017-08-25", "2017-08-29", "2017-09-01", 
"2017-09-01", "2017-09-05", "2017-09-05", "2017-09-07", "2017-09-09", 
"2017-09-08", "2017-09-11", "2017-09-13", "2017-09-15", "2017-09-15", 
"2017-09-16", "2017-09-19", "2017-09-20", "2017-09-22", "2017-09-22", 
"2017-09-23", "2017-09-26", "2017-09-26", "2017-09-28", "2017-09-28", 
"2017-09-28", "2017-09-30", "2017-09-30", "2017-10-02", "2017-10-03", 
"2017-10-03", "2017-10-04", "2017-10-05", "2017-10-07", "2017-10-06", 
"2017-10-07", "2017-10-09", "2017-10-10", "2017-10-09", "2017-10-10", 
"2017-10-12", "2017-10-11", "2017-10-12", "2017-10-14", "2017-10-17", 
"2017-10-16", "2017-10-17", "2017-10-19", "2017-10-19", "2017-10-20", 
"2017-10-21", "2017-10-25", "2017-10-28", "2017-11-04", NA)), .Names = c("Y1", 
"Y2", "Y3", "Y4"), row.names = c(NA, 56L), class = "data.frame")
塔拉特

使用data.table的非平等联接功能时,这很简单:

library(data.table)

# convert to data.table
setDT(df_X)
setDT(df_Y)

# data preparation:
df_X[, X1 := as.IDate(X1)]
df_Y[, c("Y3", "Y4") := lapply(.SD, as.IDate), .SDcols = c("Y3", "Y4")]
df_Y[, Y1 := as.numeric(Y1)]
df_X[, X5 := as.numeric(X5)]

# non-equi join + update by reference
df_X[df_Y, on = .(X3 = Y1, X1 >= Y3, X1 <= Y4), X5 := i.Y2]  #  <-- the main part

# convert back to data.frame
setDF(df_X)
setDF(df_Y)

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章

R满足3个条件的情况下比较一列中的行值

如何在不覆盖的情况下替换 R 数据框中的值?

如何获得满足数据框中某些条件的值

仅在某些情况下改变 R 中的数据帧

在不使用 for 循环的情况下匹配 R 数据帧之间的值

如果特定值满足 R 中的条件,则删除数据框中的列

R即使在满足条件的情况下,也不会显示闪亮的条件面板

R:在有条件的情况下将数据从行添加到列

如何在满足数据框中条件的行中提取某些值?

在某些情况下,如何根据不同的数据框列替换一个数据框的列值?

R //结合应用和R中的条件查找函数//在给定Col B和Col C满足要求的情况下求和col A中的值

如果在R中不满足条件的情况下增加每行变量的if语句

为 R 中的大型数据框按行查找和替换条件列值

使用R有条件地替换数据框中的列值

使用 R 中的数据框列替换条件列标题

在另一个变量的情况下,用R data.table替换多列中的值

对于满足R中特定条件的情况,如何用0替换缺失值?

R-根据另一个数据框的列中的值满足的条件,在数据框列中添加值(由公式得出)

R:为满足两个条件的某些数据设置一个值

R:使用条件根据数据框Y中的值替换数据框X中的值

ifelse() 在某些情况下缺失值但不是全部 /R

如何在R中有多个新列是有条件的情况下使用R中的多个列来表示一列的值?

R替换数据框中的列时忽略值

检查并替换R数据框中的列值

R编程:替换数据框列中的缺失值

在R中具有某些条件的情况下为循环中的变量分配值

R:如何在不替换且没有连续相同值的情况下进行采样

计算符号的出现并返回带有计数的列,但仅在某些情况下,在R中

如果在R中满足条件,则用多列中的NA替换值