我正在尝试使用包含数值和分类数据的数据集的描述性统计信息来构建表。我希望我的桌子看起来像这样:
NA单元可以为空白或不出现。
我的数据看起来像这样:
df <- data.frame(
id = c(1:6),
country = c("United Kingdom", "United Kingdom", "United Kingdom",
"Canada", "Canada", "Germany"),
gender = c("Male", "Female", "Male", "Female", "Female", "Male"),
height = c(1.9, 1.8, 2.0, 1.7, 1.9, 2.1),
play_basketball = c("Yes", "Yes", "No", "Yes", "No", "Yes"),
stringsAsFactors = TRUE
)
我尝试过的东西包括:
ftable和prop.table可以处理分类数据,但是我不确定如何删除“ No”列并添加(freq / total):
table1 <- ftable(df$country, df$gender, df$play_basketball)
prop.table(table1, 1)
No Yes
Canada Female 0.5 0.5
Male NaN NaN
Germany Female NaN NaN
Male 0.0 1.0
United Kingdom Female 0.0 1.0
Male 0.5 0.5
在数字方面,我知道如何手动计算每个均值和sd,但不知道如何执行,以便可以自动将其添加到表中:
mean(subset(df, country == "United Kingdom" &
gender == "Male")$height, na.rm = TRUE)
sd(subset(df, country == "United Kingdom" &
gender == "Male")$height, na.rm = TRUE)
我对dplyr进行标记是因为它以前使我摆脱了麻烦,但是我不是在寻找仅dplyr的解决方案。
您可以使用dplyr::summarise
获取所有摘要统计信息,然后stringr::str_glue
轻松执行格式化的字符串。
如果分解表所需的计算,则每个组都有身高的平均值和标准差,篮球运动员的数量,总行数以及篮球/总计的份额。
library(dplyr)
calcs <- df %>%
mutate(gender = forcats::fct_relevel(gender, "Male"),
country = forcats::fct_relevel(country, "United Kingdom", "Canada")) %>%
group_by(country, gender) %>%
summarise(mean_height = round(mean(height, na.rm = T), digits = 2),
sd_height = round(sd(height, na.rm = T), digits = 2),
count_bball = sum(play_basketball == "Yes"),
n = n(),
share_bball = count_bball / n) %>%
ungroup() %>%
tidyr::replace_na(list(sd_height = 0))
calcs
#> # A tibble: 4 x 7
#> country gender mean_height sd_height count_bball n share_bball
#> <fct> <fct> <dbl> <dbl> <int> <int> <dbl>
#> 1 United Kingdom Male 1.95 0.07 1 2 0.5
#> 2 United Kingdom Female 1.8 0 1 1 1
#> 3 Canada Female 1.8 0.14 1 2 0.5
#> 4 Germany Male 2.1 0 1 1 1
然后,您可以将格式化的字符串粘合在一起,删除不需要的字符串,并有选择地将其放入打印格式。tidyr::complete
为您提供NA
不在数据中的组组合的值。
formatted <- calcs %>%
mutate(height = stringr::str_glue("{mean_height} ± {scales::percent(sd_height)}"),
bball = stringr::str_glue("{scales::percent(share_bball, accuracy = 1)} ({count_bball} / {n})")) %>%
tidyr::complete(country, gender) %>%
select(country, gender, height, bball)
knitr::kable(formatted)
|country |gender |height |bball |
|:--------------|:------|:---------|:------------|
|United Kingdom |Male |1.95 ± 7% |50% (1 / 2) |
|United Kingdom |Female |1.8 ± 0% |100% (1 / 1) |
|Canada |Male |NA |NA |
|Canada |Female |1.8 ± 14% |50% (1 / 2) |
|Germany |Male |2.1 ± 0% |100% (1 / 1) |
|Germany |Female |NA |NA |
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句