我有一个要汇总的数据框,使用dplyr。在数据框中,有多个因素,我想报告每组汇总的每个因素水平的计数。
有没有一种方法可以使用dplyr进行以下操作,而不必在summary语句中命名每个因子级别。
图书馆(dplyr)
set.seed(123)
s <- rbinom(100,1,0.5)
s <- factor(s,0:1,c('M','F'))
a <- sample(1:4,100,TRUE)
a <- factor(a,1:4,c('oldest','old','young','youngest'))
w <- rnorm(100,40,10)
g <- rep(1:2,each=50)
df <- data.frame(sex=s, age=a, weight=w, group=g)
sm <- df %>% group_by(group) %>% summarise(
male = sum(ifelse(sex=='M',1,0))
,female = sum(ifelse(sex=='F',1,0))
,youngest = sum(ifelse(age=='youngest',1,0))
,young = sum(ifelse(age=='young',1,0))
,old = sum(ifelse(age=='old',1,0))
,oldest = sum(ifelse(age=='oldest',1,0))
,weight = mean(weight)
)
print(t(sm))
结果:
[,1] [,2]
group 1.000 2.00000
male 29.000 24.00000
female 21.000 26.00000
youngest 12.000 8.00000
young 13.000 17.00000
old 12.000 18.00000
oldest 13.000 7.00000
weight 37.461 40.38807
使用dplyr(尽管采用circuit回曲折的方式!):
df %>%
mutate(row_number1 = row_number(), row_number2 = row_number()) %>%
spread(sex, row_number1) %>%
spread(age, row_number2) %>%
group_by(group) %>%
mutate_each(funs(ifelse(is.na(.), 0, 1)), -weight) %>%
mutate(count = 1) %>%
summarize_each(funs(sum)) %>%
mutate(weight = weight / (count)) %>%
select(-count) %>%
t()
结果:
[,1] [,2]
group 1.000 2.00000
weight 37.461 40.38807
M 25.000 28.00000
F 25.000 22.00000
oldest 13.000 7.00000
old 12.000 18.00000
young 13.000 17.00000
youngest 12.000 8.00000
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句