如何基于R中的聚合数据生成新ID

用户3354212

我有一个数据框类似于关于按组汇总数据的已回答问题中的输出结果

但我想根据DF中唯一集群的数量为新ID创建一列

DF = read.table(text="name cluster count
       F00851.3     20      2
       F00851.2     20      2
         F00851     20      2
       F00851.8     20      2
       F00851.4     20      2
       F00851.5     20      2
       F00851.1     20      2
       F00851.6     21      2
       F00851.7     21      2
       F00958.2     23      1
       F00958.1     23      1
       F00958.3     23      1
         F00958     23      1
       F01404.5     28      3
         F01404     28      3
       F01404.4     28      3
       F01404.3     28      3
       F01404.6     29      3
       F01404.1     29      3
       F01404.7     29      3
       F01404.2     30      3
       F01404.8     30      3", header=T, stringsAsFactors=F)

预期结果:

result = read.table(text="name  cluster count   ID
    F00851.3    20  2   F00851.1
    F00851.2    20  2   F00851.1
    F00851  20  2   F00851.1
    F00851.8    20  2   F00851.1
    F00851.4    20  2   F00851.1
    F00851.5    20  2   F00851.1
    F00851.1    20  2   F00851.1
    F00851.6    21  2   F00851.2
    F00851.7    21  2   F00851.2
    F00958.2    23  1   F00958.1
    F00958.1    23  1   F00958.1
    F00958.3    23  1   F00958.1
    F00958  23  1   F00958.1
    F01404.5    28  3   F01404.1
    F01404  28  3   F01404.1
    F01404.4    28  3   F01404.1
    F01404.3    28  3   F01404.1
    F01404.6    29  3   F01404.2
    F01404.1    29  3   F01404.2
    F01404.7    29  3   F01404.2
    F01404.2    30  3   F01404.3
    F01404.8    30  3   F01404.3", header=T, stringsAsFactors=F)

就我而言,该组为substr(DF$name,1,6)因此,新列ID应该是substr(DF$name,1,6)加扩展名,并用点分隔。分机号是cluster每个组中唯一值的序列号
感谢任何帮助。

阿克伦

我们可以轻松地做到这一点data.table将'data.frame'转换为'data.table'(setDT(df)),按'count'分组,得到'cluster'(rleid的运行长度ID,paste带有使用subsubstr(name, 1, 6)用于创建'ID'的子字符串

library(data.table)
setDT(DF)[, ID := paste(sub("\\..*$", "", name), rleid(cluster), sep="."), count]
DF
#        name cluster count       ID
# 1: F00851.3      20     2 F00851.1
# 2: F00851.2      20     2 F00851.1
# 3:   F00851      20     2 F00851.1
# 4: F00851.8      20     2 F00851.1
# 5: F00851.4      20     2 F00851.1
# 6: F00851.5      20     2 F00851.1
# 7: F00851.1      20     2 F00851.1
# 8: F00851.6      21     2 F00851.2
# 9: F00851.7      21     2 F00851.2
#10: F00958.2      23     1 F00958.1
#11: F00958.1      23     1 F00958.1
#12: F00958.3      23     1 F00958.1
#13:   F00958      23     1 F00958.1
#14: F01404.5      28     3 F01404.1
#15:   F01404      28     3 F01404.1
#16: F01404.4      28     3 F01404.1
#17: F01404.3      28     3 F01404.1
#18: F01404.6      29     3 F01404.2
#19: F01404.1      29     3 F01404.2
#20: F01404.7      29     3 F01404.2
#21: F01404.2      30     3 F01404.3
#22: F01404.8      30     3 F01404.3

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章