我需要以summarize
通用方式跨多个列的data.frame:
summarize
操作很容易,例如简单的中位数,并且很简单;summarize
然后,第二个条件在另一列中包含一个条件,例如,在另一列中采用最小值(按组)的值:set.seed(4)
myDF = data.frame(i = rep(1:3, each=3),
j = rnorm(9),
a = sample.int(9),
b = sample.int(9),
c = sample.int(9),
d = 'foo')
# i j a b c d
# 1 1 0.2167549 4 5 5 foo
# 2 1 -0.5424926 7 7 4 foo
# 3 1 0.8911446 3 9 1 foo
# 4 2 0.5959806 8 6 8 foo
# 5 2 1.6356180 6 8 3 foo
# 6 2 0.6892754 1 4 6 foo
# 7 3 -1.2812466 9 1 7 foo
# 8 3 -0.2131445 5 2 2 foo
# 9 3 1.8965399 2 3 9 foo
myDF %>% group_by(i) %>% summarize(across(where(is.numeric), median, .names="med_{col}"),
best_a = a[[which.min(j)]],
best_b = b[[which.min(j)]],
best_c = c[[which.min(j)]])
# # A tibble: 3 x 8
# i med_j med_a med_b med_c best_a best_b best_c
# * <int> <dbl> <int> <int> <int> <int> <int> <int>
# 1 1 0.217 4 7 4 7 7 4
# 2 2 0.689 6 6 6 8 6 8
# 3 3 -0.213 5 2 7 9 1 7
如何summarize
以通用方式定义第二个操作(即,不像上面那样手动)?
因此,我需要这样的东西(显然不能正常工作,因为j
无法识别):
myfns = list(med = ~median(.),
best = ~.[[which.min(j)]])
myDF %>% group_by(i) %>% summarize(across(where(is.numeric), myfns, .names="{fn}_{col}"))
# Error: Problem with `summarise()` input `..1`.
# x object 'j' not found
# ℹ Input `..1` is `across(where(is.numeric), myfns, .names = "{fn}_{col}")`.
# ℹ The error occurred in group 1: i = 1.
使用另一个across
在最小的列a:c
中获取相应的值j
。
library(dplyr)
myDF %>%
group_by(i) %>%
summarize(across(where(is.numeric), median, .names="med_{col}"),
across(a:c, ~.[which.min(j)],.names = 'best_{col}'))
# i med_j med_a med_b med_c best_a best_b best_c
#* <int> <dbl> <int> <int> <int> <int> <int> <int>
#1 1 0.217 4 7 4 7 7 4
#2 2 0.689 6 6 6 8 6 8
#3 3 -0.213 5 2 7 9 1 7
要在同across
一条语句中执行此操作:
myDF %>%
group_by(i) %>%
summarize(across(where(is.numeric), list(med = median,
best = ~.[which.min(j)]),
.names="{fn}_{col}"))
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句