我想为我的数据集生成描述的统计表,这是一个包含许多类别变量的样本,这些类别变量是一种用于计算均值和偏差的过滤器。
这是复制的示例:
# example
var1 <- rep (LETTERS [1: 2], 100)
var2 <- rep (c (0.1), 100)
country <- sample (c ("Country_A", "Country_B", "Country_C"), 100, replace = TRUE)
age <- round (runif (100, min = 21, max = 70), 0)
df <- as.data.frame (cbind (var1, var2, country, age))
df $ age <- as.numeric (df $ age)
mean <- aggregate (x = df $ age, by = list (df $ country, df $ var1), FUN = mean)
colnames (mean) [1] <- "Country"
colnames (mean) [3] <- "Age"
lenght <- aggregate (x = df $ age, by = list (df $ country, df $ var1), FUN = length)
colnames (lenght) [1] <- "Country"
colnames (lenght) [3] <- "Age_N"
df_table_var1 <- merge (mean, lenght, by = "Country", all = TRUE)
但是,我试图用Loop修改此代码,以便此var1可以是var2,var3 ....生成单个对象。但这进展不顺利。数据库不广泛,因此您不必担心使用For
for (i in 3: 4) {
paste0 ("x_media", names (df) [i]) <- aggregate (x = df $ Age, by = list ((df) [i], df $ var), FUN = mean)
paste0 ("x_sd", names (df) [i]) <- aggregate (x = df $ Age, by = list ((df) [i], df $ var), FUN = sd)
}
我相信创建函数会更容易,但是我无法通过分配变量名称来做到这一点。
如果我们正在创造在全球环境中的多个对象,所以不推荐,但paste
对lhs
中<-
不会被发送。我们需要assign
for (i in 1:2) {
assign(paste0("x_media", names(df)[i]),
value = aggregate(x = df$age,
by = df[c('country', paste0('var', i))], FUN = mean))
assign(paste0("x_sd", names(df)[i]),
value = aggregate(x = df$age,
by = df[c('country', paste0('var', i))], FUN = sd))
}
-检查对象
x_mediavar1
# country var1 x
#1 Country_A A 22.04762
#2 Country_B A 22.66667
#3 Country_C A 23.64286
#4 Country_A B 21.50000
#5 Country_B B 23.00000
#6 Country_C B 24.33333
x_sdvar1
# country var1 x
#1 Country_A A 12.08295
#2 Country_B A 12.03252
#3 Country_C A 13.38107
#4 Country_A B 11.03371
#5 Country_B B 16.28451
#6 Country_C B 13.56466
x_mediavar2
# country var2 x
#1 Country_A 0.1 21.79487
#2 Country_B 0.1 22.82759
#3 Country_C 0.1 24.03125
x_sdvar2
# country var2 x
#1 Country_A 0.1 11.53916
#2 Country_B 0.1 14.11747
#3 Country_C 0.1 13.38202
也可以使用lapply
并存储在其中list
(无需创建许多对象)
lst1 <- lapply(names(df)[1:2], function(x) do.call(data.frame,
aggregate(df$age, df[c('country', x)], FUN = function(y)
c(Mean = mean(y), SD = sd(y)))))
lst1
#[[1]]
# country var1 x.Mean x.SD
#1 Country_A A 22.04762 12.08295
#2 Country_B A 22.66667 12.03252
#3 Country_C A 23.64286 13.38107
#4 Country_A B 21.50000 11.03371
#5 Country_B B 23.00000 16.28451
#6 Country_C B 24.33333 13.56466
#[[2]]
# country var2 x.Mean x.SD
#1 Country_A 0.1 21.79487 11.53916
#2 Country_B 0.1 22.82759 14.11747
#3 Country_C 0.1 24.03125 13.38202
或与 tidyverse
library(purrr)
library(dplyr)
map(names(df)[1:2], ~
df %>%
group_by(country, !! rlang::sym(.x)) %>%
summarise(Mean = mean(age), SD = sd(age)))
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句