dplyr :: mutate给出x / y = NA，汇总给出x / y =实数

本杰明

我正在实验室中验证一种用于计算通过率的函数。这背后的数学非常简单：给定许多通过或未通过的测试，通过的百分比。

数据将作为一列值提供P1（分别为（在第一次测试中通过），F1（在第一次测试中失败）P2或F2（分别在第二次测试中通过或失败）。我在passRate下面编写了函数，以帮助整体计算通过率（第一次和第二次通过），并分别对第一项测试和第二项测试进行计算。

设置验证参数的质量专家给了我通过和未通过计数的列表，我将使用以下test_vector函数将其转换为向量。

一切都很好，直到到达Pass数据框的第三行为止，该行包含来自质量专家的通过/失败计数。而不是返回第二次测试通过率100％，它返回NA ...但仅当我使用时mutate

library(dplyr)

Pass <- structure(list(P1 = c(2L, 0L, 10L), 
                       F1 = c(0L, 2L, 0L), 
                       P2 = c(0L, 3L, 2L), 
                       F2 = c(0L, 2L, 0L), 
                       id = 1:3), 
                  .Names = c("P1", "F1", "P2", "F2", "id"), 
                  class = c("tbl_df", "data.frame"), 
                  row.names = c(NA, -3L))

所以这类似于我所做的事情mutate。

Pass %>%
  group_by(id) %>%
  mutate(pass_rate = (P1 + P2) / (P1 + P2 + F1 + F2) * 100,
         pass_rate1 = P1 / (P1 + F1) * 100,
         pass_rate2 = P2 / (P2 + F2) * 100)

Source: local data frame [3 x 8]
Groups: id [3]

     P1    F1    P2    F2    id pass_rate pass_rate1 pass_rate2
  (int) (int) (int) (int) (int)     (dbl)      (dbl)      (dbl)
1     2     0     0     0     1 100.00000        100         NA
2     0     2     3     2     2  42.85714          0         60
3    10     0     3     1     3 100.00000        100         NA

使用时比较 summarise

Pass %>%
  group_by(id) %>%
  summarise(pass_rate = (P1 + P2) / (P1 + P2 + F1 + F2) * 100,
            pass_rate1 = P1 / (P1 + F1) * 100,
            pass_rate2 = P2 / (P2 + F2) * 100)

Source: local data frame [3 x 4]

     id pass_rate pass_rate1 pass_rate2
  (int)     (dbl)      (dbl)      (dbl)
1     1 100.00000        100         NA
2     2  42.85714          0         60
3     3 100.00000        100        100

我希望这些返回相同的结果。我的猜测是mutate某处有问题，因为它假定n每个组的行都应映射到n结果中的行（在n这里计算时会感到困惑吗？），尽管summarise知道无论它以多少行开头，都将以结尾结尾1。

是否有人对这种行为背后的机制有任何想法？

卡斯滕·奥皮兹（Carsten Oppitz）

在我看来，dplyr和之间存在一些干扰plyr。我在另一个不平衡的数据集上也遇到了同样的问题（因此需要分组），正是在第三组中，变异变量错误地是NA！然后，我在家复制了您的示例。首先，之后

library("dplyr", lib.loc="~/R/x86_64-pc-linux-gnu-library/3.2")

我完全得到您的结果。然后，我执行了自己的脚本，该脚本plyr已在其中加载。在警告不加载plyr之后dplyr，第三组中的NA不复存在，您的示例也被正确计算了！这是我所做的（我又添加了一行以查看NA是否仍留在第三组中）：

> Pass <- structure(list(P1 = c(2L, 0L, 10L,8L), 
+                        F1 = c(0L, 2L, 0L, 4L), 
+                        P2 = c(0L, 3L, 2L, 2L), 
+                        F2 = c(0L, 2L, 0L, 1L), 
+                        id = 1:4), 
+                   .Names = c("P1", "F1", "P2", "F2", "id"), 
+                   class = c("tbl_df", "data.frame"), 
+                   row.names = c(NA, -4L))
> Pass %>%
+     group_by(id) %>%
+     mutate(pass_rate = (P1 + P2) / (P1 + P2 + F1 + F2) * 100,
+            pass_rate1 = P1 / (P1 + F1) * 100,
+            pass_rate2 = P2 / (P2 + F2) * 100)
Source: local data frame [4 x 8]
Groups: id [4]

 P1    F1    P2    F2    id pass_rate pass_rate1 pass_rate2
(int) (int) (int) (int) (int)     (dbl)      (dbl)      (dbl)
 1     2     0     0     0     1 100.00000  100.00000         NA
 2     0     2     3     2     2  42.85714    0.00000   60.00000
 3    10     0     2     0     3 100.00000  100.00000         NA
 4     8     4     2     1     4  66.66667   66.66667   66.66667

然后我做了：

> library("plyr", lib.loc="~/R/x86_64-pc-linux-gnu-library/3.2")
> Pass %>%
+     group_by(id) %>%
+     mutate(pass_rate = (P1 + P2) / (P1 + P2 + F1 + F2) * 100,
+            pass_rate1 = P1 / (P1 + F1) * 100,
+            pass_rate2 = P2 / (P2 + F2) * 100)
Source: local data frame [4 x 8]
Groups: id [4]

 P1    F1    P2    F2    id pass_rate pass_rate1 pass_rate2
(int) (int) (int) (int) (int)     (dbl)      (dbl)      (dbl)
 1     2     0     0     0     1 100.00000  100.00000        NaN
 2     0     2     3     2     2  42.85714    0.00000   60.00000
 3    10     0     2     0     3 100.00000  100.00000  100.00000
 4     8     4     2     1     4  66.66667   66.66667   66.66667

我知道这是不是一个令人满意的答案，因为plyr应该不经过加载dplyr，但也许它可以帮助那些是谁需要group_by(id)。或使用plyr::mutate()。然后，您可以dplyr在之后加载plyr：

 > Pass %>%
+     group_by(id) %>%
+     plyr::mutate(pass_rate = (P1 + P2) / (P1 + P2 + F1 + F2) * 100,
+            pass_rate1 = P1 / (P1 + F1) * 100,
+            pass_rate2 = P2 / (P2 + F2) * 100)
Source: local data frame [4 x 8]
Groups: id [4]

 P1    F1    P2    F2    id pass_rate pass_rate1 pass_rate2
(int) (int) (int) (int) (int)     (dbl)      (dbl)      (dbl)
 1     2     0     0     0     1 100.00000  100.00000        NaN
 2     0     2     3     2     2  42.85714    0.00000   60.00000
 3    10     0     2     0     3 100.00000  100.00000  100.00000
 4     8     4     2     1     4  66.66667   66.66667   66.66667

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2020-10-30

我来说两句

0 条评论

登录后参与评论

上一篇：为什么在下面的函数中递增数组“ a”不是错误？

TOP 榜单

文章

dplyr :: mutate给出x / y = NA，汇总给出x / y =实数

dplyr :: mutate给出x / y = NA，汇总给出x / y =实数

蓝屏死机没有修复解决方案

计算数据帧中每行的NA

UITableView的项目向下滚动后更改颜色，然后快速备份

Node.js中未捕获的异常错误，发生调用

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

Linux的官方Adobe Flash存储库是否已过时？

验证REST API参数

ggplot：对齐多个分面图-所有大小不同的分面

Mac OS X更新后的GRUB 2问题

通过 Git 在运行 Jenkins 作业时获取 ClassNotFoundException

带有错误“ where”条件的查询如何返回结果？

用日期数据透视表和日期顺序查询

VB.net将2条特定行导出到DataGridView

如何从视图一次更新多行（ASP.NET - Core）

Java Eclipse中的错误13，如何解决？

尝试反复更改屏幕上按钮的位置 - kotlin android studio

离子动态工具栏背景色

应用发明者仅从列表中选择一个随机项一次

当我尝试下载 StanfordNLP en 模型时，出现错误

python中的boto3文件上传

在同一Pushwoosh应用程序上Pushwoosh多个捆绑ID