我有这个数据帧(命名为OEM_final
)。这是结构:
str(OEM_final)
'data.frame': 2265 obs. of 17 variables:
$ dia_hora_OEM : POSIXct, format: "2019-12-31 06:40:13" "2019-12-31 06:43:00" "2019-12-31 07:11:30" "2019-12-31 07:18:30" ...
$ coche_OEM : Factor w/ 6 levels "356232050832996",..: 3 3 3 3 3 3 3 3 6 6 ...
$ DTC_OEM_dec64: chr "[{\"code\":\"B1182\",\"description\":\"Tire pressure monitor module\",\"faultInformations\":[{\"description\":\"| __truncated__ "[{\"code\":\"B1182\",\"description\":\"Tire pressure monitor module\",\"faultInformations\":[{\"description\":\"| __truncated__ "[{\"code\":\"B1182\",\"description\":\"Tire pressure monitor module\",\"faultInformations\":[{\"description\":\"| __truncated__ "[{\"code\":\"B1182\",\"description\":\"Tire pressure monitor module\",\"faultInformations\":[{\"description\":\"| __truncated__ ...
$ rowname : Factor w/ 2265 levels "1","10","100",..: 1 1112 1489 1600 1711 1822 1933 2044 2155 2 ...
$ B1182 : Factor w/ 2 levels "B1182","NULL": 1 1 1 1 1 1 1 1 2 2 ...
$ B124D : Factor w/ 2 levels "B124D","NULL": 1 1 1 1 1 1 1 1 2 2 ...
$ NA. : Factor w/ 6 levels "c(NA, NA, NA, NA, NA, NA, NA, NA)",..: 3 3 3 3 3 3 3 3 1 1 ...
$ P2000 : Factor w/ 2 levels "c(\"P2000\", \"P2000\", \"P2000\")",..: 2 2 2 2 2 2 2 2 2 2 ...
$ U3003 : Factor w/ 2 levels "NULL","U3003": 1 1 1 1 1 1 1 1 1 1 ...
$ B1D01 : Factor w/ 3 levels "B1D01","c(\"B1D01\", \"B1D01\")",..: 3 3 3 3 3 3 3 3 3 3 ...
$ U0155 : Factor w/ 2 levels "NULL","U0155": 1 1 1 1 1 1 1 1 1 1 ...
$ C1B00 : Factor w/ 2 levels "C1B00","NULL": 2 2 2 2 2 2 2 2 2 2 ...
$ P037D : Factor w/ 2 levels "NULL","P037D": 1 1 1 1 1 1 1 1 1 1 ...
$ P0616 : Factor w/ 2 levels "NULL","P0616": 1 1 1 1 1 1 1 1 1 1 ...
$ P0562 : Factor w/ 2 levels "NULL","P0562": 1 1 1 1 1 1 1 1 1 1 ...
$ U0073 : Factor w/ 2 levels "NULL","U0073": 1 1 1 1 1 1 1 1 1 1 ...
$ P0138 : Factor w/ 2 levels "c(\"P0138\", \"P0138\", \"P0138\")",..: 2 2 2 2 2 2 2 2 2 2 ...
我想计算出较早的日期(dia_hora_OEM
由两个因素分组时出现)。这两个因素是:
coche_OEM
。P2000
)到最后一个(P0138
),一次一个。所以,group_by()
应该是:
group_by(coche_OEM, P2000)
group_by(coche_OEM, U3003)
group_by(coche_OEM, B1D01)
group_by(coche_OEM, U0155)
我尝试不同的方法来实现:
for
循环:for (DTC in c(U3003, P2000)) {
OEM_final %>%
group_by(DTC, coche_OEM) %>%
filter(dia_hora_OEM == min(dia_hora_OEM))
}
但是,我得到一个错误说:
Error in c(U3003, P2000) : object 'U3003' not found
lapply
在这种情况下,我创建了一个功能:
groupCombDTC <- function(x) {
OEM_final %>%
group_by(coche_OEM, x) %>%
filter(dia_hora_OEM == min(dia_hora_OEM))
}
然后我跑了lapply()
:
lapply(colnames(OEM_final)[8:17], groupCombDTC)
我得到这个错误:
Error: Column `x` is unknown
任何人可以帮助我以不同的组合使用迭代group_by()
?
这是标准评价的标准问题dplyr
。dplyr
基于非标准如此评价报价参数必须加引号。
有几种解决方案。这一个效果很好
groupCombDTC <- function(x) {
OEM_final %>%
group_by(coche_OEM, !!rlang::sym(x)) %>%
filter(dia_hora_OEM == min(dia_hora_OEM))
}
它需要一起使用!!
,并rlang::sym
以所享有和评估您的变量名。
列名作为参数更容易与手柄data.table
。如果你想在关于SE / NSE更多的元素dplyr
和data.table
,你可以看看一个博客帖子我写了几天前
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句