在R中两个因素GROUP_BY（）使用lapply

前任：

我有这个数据帧（命名为OEM_final）。这是结构：

str(OEM_final)
'data.frame':   2265 obs. of  17 variables:
 $ dia_hora_OEM : POSIXct, format: "2019-12-31 06:40:13" "2019-12-31 06:43:00" "2019-12-31 07:11:30" "2019-12-31 07:18:30" ...
 $ coche_OEM    : Factor w/ 6 levels "356232050832996",..: 3 3 3 3 3 3 3 3 6 6 ...
 $ DTC_OEM_dec64: chr  "[{\"code\":\"B1182\",\"description\":\"Tire pressure monitor module\",\"faultInformations\":[{\"description\":\"| __truncated__ "[{\"code\":\"B1182\",\"description\":\"Tire pressure monitor module\",\"faultInformations\":[{\"description\":\"| __truncated__ "[{\"code\":\"B1182\",\"description\":\"Tire pressure monitor module\",\"faultInformations\":[{\"description\":\"| __truncated__ "[{\"code\":\"B1182\",\"description\":\"Tire pressure monitor module\",\"faultInformations\":[{\"description\":\"| __truncated__ ...
 $ rowname      : Factor w/ 2265 levels "1","10","100",..: 1 1112 1489 1600 1711 1822 1933 2044 2155 2 ...
 $ B1182        : Factor w/ 2 levels "B1182","NULL": 1 1 1 1 1 1 1 1 2 2 ...
 $ B124D        : Factor w/ 2 levels "B124D","NULL": 1 1 1 1 1 1 1 1 2 2 ...
 $ NA.          : Factor w/ 6 levels "c(NA, NA, NA, NA, NA, NA, NA, NA)",..: 3 3 3 3 3 3 3 3 1 1 ...
 $ P2000        : Factor w/ 2 levels "c(\"P2000\", \"P2000\", \"P2000\")",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ U3003        : Factor w/ 2 levels "NULL","U3003": 1 1 1 1 1 1 1 1 1 1 ...
 $ B1D01        : Factor w/ 3 levels "B1D01","c(\"B1D01\", \"B1D01\")",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ U0155        : Factor w/ 2 levels "NULL","U0155": 1 1 1 1 1 1 1 1 1 1 ...
 $ C1B00        : Factor w/ 2 levels "C1B00","NULL": 2 2 2 2 2 2 2 2 2 2 ...
 $ P037D        : Factor w/ 2 levels "NULL","P037D": 1 1 1 1 1 1 1 1 1 1 ...
 $ P0616        : Factor w/ 2 levels "NULL","P0616": 1 1 1 1 1 1 1 1 1 1 ...
 $ P0562        : Factor w/ 2 levels "NULL","P0562": 1 1 1 1 1 1 1 1 1 1 ...
 $ U0073        : Factor w/ 2 levels "NULL","U0073": 1 1 1 1 1 1 1 1 1 1 ...
 $ P0138        : Factor w/ 2 levels "c(\"P0138\", \"P0138\", \"P0138\")",..: 2 2 2 2 2 2 2 2 2 2 ...

我想计算出较早的日期（dia_hora_OEM由两个因素分组时出现）。这两个因素是：

这一个因素，这是在所有可能的组合常见的，是coche_OEM。
另一种是一个从第8栏（P2000）到最后一个（P0138），一次一个。

所以，group_by()应该是：

group_by(coche_OEM, P2000)
group_by(coche_OEM, U3003)
group_by(coche_OEM, B1D01)
group_by(coche_OEM, U0155)
...

我尝试不同的方法来实现：

使用`for`循环：

for (DTC in c(U3003, P2000)) {
  OEM_final %>%
  group_by(DTC, coche_OEM) %>%
  filter(dia_hora_OEM == min(dia_hora_OEM))
}

但是，我得到一个错误说：

Error in c(U3003, P2000) : object 'U3003' not found

运用 `lapply`

在这种情况下，我创建了一个功能：

groupCombDTC <- function(x) {
  OEM_final %>%
  group_by(coche_OEM, x) %>%
  filter(dia_hora_OEM == min(dia_hora_OEM))
}

然后我跑了lapply()：

lapply(colnames(OEM_final)[8:17], groupCombDTC)

我得到这个错误：

Error: Column `x` is unknown

任何人可以帮助我以不同的组合使用迭代group_by()？

地震：

这是标准评价的标准问题dplyr。dplyr基于非标准如此评价报价参数必须加引号。

有几种解决方案。这一个效果很好

groupCombDTC <- function(x) {
  OEM_final %>%
  group_by(coche_OEM, !!rlang::sym(x)) %>%
  filter(dia_hora_OEM == min(dia_hora_OEM))
}

它需要一起使用!!，并rlang::sym以所享有和评估您的变量名。

列名作为参数更容易与手柄data.table。如果你想在关于SE / NSE更多的元素dplyr和data.table，你可以看看一个博客帖子我写了几天前

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2020-04-4

我来说两句

0 条评论

登录后参与评论

在R中两个因素GROUP_BY（）使用lapply

在R中两个因素GROUP_BY（）使用lapply

使用for循环：

运用 lapply

UITableView的项目向下滚动后更改颜色，然后快速备份

Linux的官方Adobe Flash存储库是否已过时？

用日期数据透视表和日期顺序查询

应用发明者仅从列表中选择一个随机项一次

Mac OS X更新后的GRUB 2问题

验证REST API参数

Java Eclipse中的错误13，如何解决？

带有错误“ where”条件的查询如何返回结果？

ggplot：对齐多个分面图-所有大小不同的分面

尝试反复更改屏幕上按钮的位置 - kotlin android studio

如何从视图一次更新多行（ASP.NET - Core）

计算数据帧中每行的NA

蓝屏死机没有修复解决方案

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

离子动态工具栏背景色

VB.net将2条特定行导出到DataGridView

通过 Git 在运行 Jenkins 作业时获取 ClassNotFoundException

在Windows 7中无法删除文件（2）

python中的boto3文件上传

当我尝试下载 StanfordNLP en 模型时，出现错误

Node.js中未捕获的异常错误，发生调用

使用`for`循环：

运用 `lapply`