用两个变量的函数汇总多个变量

阿里斯塔特

该summarise_if函数对于汇总多个变量非常有帮助。假设我需要数据集中每个数字变量的平均值。我可以用

df <- as_tibble(iris)
df %>% summarise_if(is.numeric, .fun = mean)

这完美地工作。但现在假设函数 in.fun涉及数据集中的 2 个参数（一个例子是weighet.mean，其中权重变量是 Sepal.Length）。我试过，

df %>% summarise_if(is.numeric, .fun = function(x, w) weighted.mean(x, w), w = Sepal.Length)

错误是

list2(...) 中的错误：找不到对象“Sepal.Width”

我怀疑 R 没有搜索Sepal.Length，df而是在它的全局环境中搜索。所以我必须使用，

df %>% summarise_if(is.numeric, .fun = function(x, w) weighted.mean(x, w), w = df$Sepal.Length)

这行得通，但不适合做 df$Sepal.Length。例如，我完全不可能按组计算加权平均值。

df %>% group_by(Species) %>% summarise_if(is.numeric, .fun = function(x, w) weighted.mean(x, w), w = df$Sepal.Length)

错误：summarise()列有问题Sepal.Length。ℹ Sepal.Length = (function (x, w) .... x 'x' 和 'w' 必须具有相同的长度 ℹ 错误发生在第 1 组：物种 = setosa。

那么，如何使用summarise_if或summarise_at使用涉及数据集中两个变量的函数。

阿克伦

如果我们需要使用Sepal.Lengthas w，连接 ( c) from 的输出where(is.numeric)并指定-Sepal.Length从中删除列across，然后weighted.mean在其他数字列上使用was 'Sepal.Length'

library(dplyr)
df %>% 
   summarise(across(c(where(is.numeric), -Sepal.Length), 
        ~ weighted.mean(., w = Sepal.Length)))
# A tibble: 1 × 3
  Sepal.Width Petal.Length Petal.Width
        <dbl>        <dbl>       <dbl>
1        3.05         3.97        1.29

或者一个分组的将是

df %>%
   group_by(Species) %>% 
   summarise(across(c(where(is.numeric), -Sepal.Length), 
        ~ weighted.mean(., w = Sepal.Length)))

-输出

# A tibble: 3 × 4
  Species    Sepal.Width Petal.Length Petal.Width
  <fct>            <dbl>        <dbl>       <dbl>
1 setosa            3.45         1.47       0.248
2 versicolor        2.78         4.29       1.34 
3 virginica         2.99         5.60       2.03

注意：_if, _at,_all后缀函数已弃用across

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。