按一列以逗号分隔的字符串分组,但分组应忽略字符串的特定顺序

Parseltongue

假设我有以下数据

> summary_table[, c('condition_list', 'condition_count')]
# A tibble: 4,306 x 2
   condition_list             condition_count
   <chr>                                <int>
 1 true control,control email               2
 2 true control,control email               1
 3 treatment, control email                 1
 4 true control, control email              1
 5 control email, true control              1
 6 control email                            1
 7 control email, treatment                 1
 8 control email,true control               2
 9 treatment                                1
10 control email, true control              1

注意,“ condition_list”列由逗号限制的字符串组成,这些字符串指示对某些条件的分配,但是其中一些分配是同构的。我想得到每种条件下的行数,如下所示:

summary_table %>% group_by(condition_list) %>%
  summarize(n= n())

但是,这会将的每个特定组合condition_list视为一个单独的组。我希望它将“控制电子邮件,真正的控制”与“控制电子邮件,真正的控制”相同。做这个的最好方式是什么?

> dput(dputter)
structure(list(condition_list = c("true control,control email", 
"true control", "treatment", "true control", "control email", 
"control email", "control email", "control email,true control", 
"treatment", "control email", "true control,treatment", "treatment,true control", 
"treatment,true control,control email", "control email", "treatment", 
"true control,control email", "control email", "treatment", "true control,treatment", 
"control email", "control email,true control", "treatment", "control email", 
"control email", "control email,true control", "control email", 
"control email", "true control", "treatment", "true control", 
"treatment", "true control", "true control", "control email", 
"true control", "control email", "control email", "true control", 
"treatment", "treatment,true control,control email", "true control", 
"true control", "treatment,control email", "true control", "true control", 
"control email", "control email", "treatment", "control email", 
"true control"), condition_count = c(2L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 1L, 1L, 2L, 2L, 3L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 
1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 3L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -50L))
DiceboyT

这是一个整洁的解决方案:

library(tidyverse)

summary_table %>% 
  mutate(condition_list = 
           strsplit(condition_list, ",") %>% 
           map(sort) %>% 
           map_chr(paste, collapse = ",")
         ) %>%
  group_by(condition_list) %>% 
  tally()
# A tibble: 7 x 2
#  condition_list                           n
#  <chr>                                <int>
#1 control email                           17
#2 control email,treatment                  1
#3 control email,treatment,true control     2
#4 control email,true control               5
#5 treatment                                9
#6 treatment,true control                   3
#7 true control                            13

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章

TOP 榜单

  1. 1

    来自Microsoft Office加载项taskpane.js的MySQL驱动程序模块的空引用

  2. 2

    使用AWS Cognito和React的仅限Facebook / Google的登录名(无用户名/密码)

  3. 3

    创建Windows Phone 8应用并将其连接到数据库的最佳方法(最好是SQL Server)

  4. 4

    为什么Java中的System.out.println()打印到控制台?

  5. 5

    卷曲函数无法解析来自bash中变量的代理

  6. 6

    是什么在Android的consumer-rules.pro和proguard-rules.pro之间的区别?

  7. 7

    设置与Apache POI Excel表散点图标记图标的颜色

  8. 8

    将Qt Pyside2与asyncio await语法一起使用?

  9. 9

    崇高的文字+蟒蛇的蟒蛇

  10. 10

    任务':app:minifyReleaseWithR8'.java.lang.NullPointerException的执行失败(无错误消息)

  11. 11

    OpenJDK的和AdoptOpenJDK的区别

  12. 12

    大型数据集缓存到Spark内存中时,“超出了GC开销限制”(通过sparklyr和RStudio)

  13. 13

    “执行测试CMAKE_HAVE_LIBC_PTHREAD”失败实际上是什么意思?

  14. 14

    使用Core 2.2中的Identity,如何在关闭浏览器15分钟后保持会话活动?

  15. 15

    React中的ForwardRefExoticComponent和ForwardRefRenderFunction有什么区别?

  16. 16

    猫鼬查找结果,然后将字段替换为findOne

  17. 17

    如何降级Google Colab的Torch版本

  18. 18

    Keras提前停止回调错误,val_loss指标不可用

  19. 19

    如何避免VSCode中的“导入路径不能以.ts扩展名结尾”错误?

  20. 20

    Nuxt.JS:如何在页面中获取路由URL参数

  21. 21

    是否有为什么会AccessibilityManager.sInstance导致内存泄漏的一个原因?

热门标签

归档