我正在尝试计算相对于包含5个(+1个NA)不同收入等级的变量的二进制变量的发生率/百分比。我正在使用:
afghan %>% group_by(income) %>%
summarize(violent.exp.ISAF = n()) %>%
mutate(Percentage = violent.exp.ISAF/sum(violent.exp.ISAF)*100)
但这给了我二进制变量相对于整个表的一般百分比,而不仅仅是在特定的收入范围内,如下所示:
# income violent.exp.taliban Percentage
# <chr> <int> <dbl>
#1 10,001-20,000 616 22.4
#2 2,001-10,000 1420 51.6
#3 20,001-30,000 93 3.38
#4 less than 2,000 457 16.6
#5 over 30,000 14 0.508
#6 NA 154 5.59
我想让二元变量的百分比恰好在该特定收入范围内。有什么建议吗?
阿富汗数据集的示例:
> dput(head(afghan))
structure(list(province = c("Logar", "Logar", "Logar", "Logar",
"Logar", "Logar"), district = c("Baraki Barak", "Baraki Barak",
"Baraki Barak", "Baraki Barak", "Baraki Barak", "Baraki Barak"
), village.id = c(80, 80, 80, 80, 80, 80), age = c(26, 49, 60,
34, 21, 18), educ.years = c(10, 3, 0, 14, 12, 10), employed = c(0,
1, 1, 1, 1, 1), income = c("2,001-10,000", "2,001-10,000", "2,001-10,000",
"2,001-10,000", "2,001-10,000", NA), violent.exp.ISAF = c(0,
0, 1, 0, 0, 0), violent.exp.taliban = c(0, 0, 0, 0, 0, 0), list.group = c("control",
"control", "control", "ISAF", "ISAF", "ISAF"), list.response = c(0,
1, 1, 3, 3, 2)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
使用dplyr
/ tidyverse
和janitor
,您可以执行以下操作:
library(tidyverse)
library(janitor)
afghan %>%
group_by(income) %>%
tabyl(income, violent.exp.ISAF) %>%
adorn_percentages() %>%
adorn_pct_formatting()
这显示了您的跨百分比分布income
:
income 0 1
2,001-10,000 80.0% 20.0%
<NA> 100.0% 0.0%
要创建一个tibble
:
afghan_tibble <- afghan %>%
group_by(income) %>%
tabyl(income, violent.exp.ISAF) %>%
adorn_percentages() %>%
adorn_pct_formatting() %>%
as_tibble()
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句