如何使用dplyr对另一个（分组的）列进行条件化来汇总多个列？

ztl 发表于 Dev

ztl

我需要以summarize通用方式跨多个列的data.frame：

第一个summarize操作很容易，例如简单的中位数，并且很简单；
summarize然后，第二个条件在另一列中包含一个条件，例如，在另一列中采用最小值（按组）的值：

set.seed(4)

myDF = data.frame(i = rep(1:3, each=3),
                  j = rnorm(9),
                  a = sample.int(9),
                  b = sample.int(9),
                  c = sample.int(9),
                  d = 'foo')
#   i          j a b c   d
# 1 1  0.2167549 4 5 5 foo
# 2 1 -0.5424926 7 7 4 foo
# 3 1  0.8911446 3 9 1 foo
# 4 2  0.5959806 8 6 8 foo
# 5 2  1.6356180 6 8 3 foo
# 6 2  0.6892754 1 4 6 foo
# 7 3 -1.2812466 9 1 7 foo
# 8 3 -0.2131445 5 2 2 foo
# 9 3  1.8965399 2 3 9 foo

myDF %>% group_by(i) %>% summarize(across(where(is.numeric), median, .names="med_{col}"),
                                   best_a = a[[which.min(j)]],
                                   best_b = b[[which.min(j)]],
                                   best_c = c[[which.min(j)]])
# # A tibble: 3 x 8
#      i   med_j med_a med_b med_c best_a best_b best_c
# * <int>   <dbl> <int> <int> <int>  <int>  <int>  <int>
# 1     1  0.217     4     7     4      7      7      4
# 2     2  0.689     6     6     6      8      6      8
# 3     3 -0.213     5     2     7      9      1      7

如何summarize以通用方式定义第二个操作（即，不像上面那样手动）？

因此，我需要这样的东西（显然不能正常工作，因为j无法识别）：

myfns = list(med = ~median(.),
             best = ~.[[which.min(j)]])
myDF %>% group_by(i) %>% summarize(across(where(is.numeric), myfns, .names="{fn}_{col}"))
# Error: Problem with `summarise()` input `..1`.
# x object 'j' not found
# ℹ Input `..1` is `across(where(is.numeric), myfns, .names = "{fn}_{col}")`.
# ℹ The error occurred in group 1: i = 1.

罗纳克·沙

使用另一个across在最小的列a:c中获取相应的值j。

library(dplyr)

myDF %>% 
  group_by(i) %>% 
  summarize(across(where(is.numeric), median, .names="med_{col}"),
            across(a:c,  ~.[which.min(j)],.names = 'best_{col}'))

#      i  med_j med_a med_b med_c best_a best_b best_c
#* <int>  <dbl> <int> <int> <int>  <int>  <int>  <int>
#1     1  0.217     4     7     4      7      7      4
#2     2  0.689     6     6     6      8      6      8
#3     3 -0.213     5     2     7      9      1      7

要在同across一条语句中执行此操作：

myDF %>% 
  group_by(i) %>% 
  summarize(across(where(is.numeric), list(med = median, 
                                           best = ~.[which.min(j)]), 
                                      .names="{fn}_{col}"))

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。