我有几个二进制变量的数据,我想通过另一个变量计算每个变量的比例。
我对人们进行调查并询问他们:
请标记您喜欢以下哪种水果(可以标记多种选择):
☐香蕉☐苹果☐橙子☐草莓☐桃子
选中此框的每个人都会1
输入数据,当留空时,将表示为0
。数据如下所示:
library(dplyr)
set.seed(2021)
my_df <-
matrix(rbinom(n = 100, size = 1, prob = runif(1)), ncol = 5) %>%
as.data.frame() %>%
cbind(1:20, ., sample(c("male", "female"), size = 20, replace = T)) %>%
setNames(c("person_id", "banana", "apple", "orange", "strawberry", "peach", "gender"))
my_df
#> person_id banana apple orange strawberry peach gender
#> 1 1 1 1 1 0 0 female
#> 2 2 1 0 0 0 1 female
#> 3 3 0 0 1 0 1 female
#> 4 4 1 1 0 1 0 female
#> 5 5 1 1 1 0 0 male
#> 6 6 1 1 1 0 1 female
#> 7 7 0 1 0 1 1 male
#> 8 8 1 1 0 0 0 male
#> 9 9 1 1 1 0 0 female
#> 10 10 0 0 0 0 0 male
#> 11 11 1 1 1 1 1 male
#> 12 12 1 1 0 0 1 male
#> 13 13 1 1 0 1 0 male
#> 14 14 1 1 0 0 0 male
#> 15 15 0 0 0 0 1 male
#> 16 16 0 1 0 0 1 male
#> 17 17 1 0 0 0 1 male
#> 18 18 1 1 1 1 1 male
#> 19 19 0 0 1 1 1 female
#> 20 20 0 0 0 0 0 female
由reprex软件包(v0.3.0)创建于2021-02-01
我想得到每种水果的比例,除以gender
。通过此答案,我了解了如何对一个变量进行操作(例如banana
):
my_df %>%
group_by(gender) %>%
summarise(n_of_observations = n(), prop = sum(banana == 1)/n())
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 2 x 3
## gender n_of_observations prop
## <chr> <int> <dbl>
## 1 female 10 0.6
## 2 male 10 0.4
但是,如何得到所有水果这样的桌子呢?
所需的输出:
## fruit gender prop
## <chr> <chr> <dbl>
## 1 banana female 0.6
## 2 banana male 0.4
## 3 apple female 0.4
## 4 apple male 0.3
## 5 orange female 0.3
## 6 orange male 0.1
## 7 strawberry female 0.4
## 8 strawberry male 0.4
## 9 peach female 0.3
## 10 peach male 0.6
dplyr
如果可能,我正在寻找解决方案。非常感谢!
您可以tidyr
先透视数据,然后再对其进行汇总:
library(tidyr)
tidyr::pivot_longer(my_df, banana:peach,
names_to = "fruit") %>%
dplyr::group_by(gender, fruit) %>%
dplyr::summarize(prop = sum(value) / n())
gender fruit prop
<chr> <chr> <dbl>
1 female apple 0.5
2 female banana 0.625
3 female orange 0.625
4 female peach 0.5
5 female strawberry 0.25
6 male apple 0.75
7 male banana 0.667
8 male orange 0.25
9 male peach 0.583
10 male strawberry 0.333
arrange
如果要按排序,可以将其传送到fruit
。您还可以使用来在summarize
函数中添加观测值的数量n = n()
。
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句