假设我有以下数据
> summary_table[, c('condition_list', 'condition_count')]
# A tibble: 4,306 x 2
condition_list condition_count
<chr> <int>
1 true control,control email 2
2 true control,control email 1
3 treatment, control email 1
4 true control, control email 1
5 control email, true control 1
6 control email 1
7 control email, treatment 1
8 control email,true control 2
9 treatment 1
10 control email, true control 1
注意,“ condition_list”列由逗号限制的字符串组成,这些字符串指示对某些条件的分配,但是其中一些分配是同构的。我想得到每种条件下的行数,如下所示:
summary_table %>% group_by(condition_list) %>%
summarize(n= n())
但是,这会将的每个特定组合condition_list
视为一个单独的组。我希望它将“控制电子邮件,真正的控制”与“控制电子邮件,真正的控制”相同。做这个的最好方式是什么?
> dput(dputter)
structure(list(condition_list = c("true control,control email",
"true control", "treatment", "true control", "control email",
"control email", "control email", "control email,true control",
"treatment", "control email", "true control,treatment", "treatment,true control",
"treatment,true control,control email", "control email", "treatment",
"true control,control email", "control email", "treatment", "true control,treatment",
"control email", "control email,true control", "treatment", "control email",
"control email", "control email,true control", "control email",
"control email", "true control", "treatment", "true control",
"treatment", "true control", "true control", "control email",
"true control", "control email", "control email", "true control",
"treatment", "treatment,true control,control email", "true control",
"true control", "treatment,control email", "true control", "true control",
"control email", "control email", "treatment", "control email",
"true control"), condition_count = c(2L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 1L, 1L, 2L, 2L, 3L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L,
1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 3L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -50L))
这是一个整洁的解决方案:
library(tidyverse)
summary_table %>%
mutate(condition_list =
strsplit(condition_list, ",") %>%
map(sort) %>%
map_chr(paste, collapse = ",")
) %>%
group_by(condition_list) %>%
tally()
# A tibble: 7 x 2
# condition_list n
# <chr> <int>
#1 control email 17
#2 control email,treatment 1
#3 control email,treatment,true control 2
#4 control email,true control 5
#5 treatment 9
#6 treatment,true control 3
#7 true control 13
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句