我有一个数据框类似于关于按组汇总数据的已回答问题中的输出结果
但我想根据DF中唯一集群的数量为新ID创建一列
DF = read.table(text="name cluster count
F00851.3 20 2
F00851.2 20 2
F00851 20 2
F00851.8 20 2
F00851.4 20 2
F00851.5 20 2
F00851.1 20 2
F00851.6 21 2
F00851.7 21 2
F00958.2 23 1
F00958.1 23 1
F00958.3 23 1
F00958 23 1
F01404.5 28 3
F01404 28 3
F01404.4 28 3
F01404.3 28 3
F01404.6 29 3
F01404.1 29 3
F01404.7 29 3
F01404.2 30 3
F01404.8 30 3", header=T, stringsAsFactors=F)
预期结果:
result = read.table(text="name cluster count ID
F00851.3 20 2 F00851.1
F00851.2 20 2 F00851.1
F00851 20 2 F00851.1
F00851.8 20 2 F00851.1
F00851.4 20 2 F00851.1
F00851.5 20 2 F00851.1
F00851.1 20 2 F00851.1
F00851.6 21 2 F00851.2
F00851.7 21 2 F00851.2
F00958.2 23 1 F00958.1
F00958.1 23 1 F00958.1
F00958.3 23 1 F00958.1
F00958 23 1 F00958.1
F01404.5 28 3 F01404.1
F01404 28 3 F01404.1
F01404.4 28 3 F01404.1
F01404.3 28 3 F01404.1
F01404.6 29 3 F01404.2
F01404.1 29 3 F01404.2
F01404.7 29 3 F01404.2
F01404.2 30 3 F01404.3
F01404.8 30 3 F01404.3", header=T, stringsAsFactors=F)
就我而言,该组为substr(DF$name,1,6)
。因此,新列ID应该是substr(DF$name,1,6)
加扩展名,并用点分隔。分机号是cluster
每个组中列中唯一值的序列号。
感谢任何帮助。
我们可以轻松地做到这一点data.table
。将'data.frame'转换为'data.table'(setDT(df)
),按'count'分组,得到'cluster'(rleid
)的运行长度ID,并paste
带有使用sub
或substr(name, 1, 6)
用于创建'ID'的子字符串柱
library(data.table)
setDT(DF)[, ID := paste(sub("\\..*$", "", name), rleid(cluster), sep="."), count]
DF
# name cluster count ID
# 1: F00851.3 20 2 F00851.1
# 2: F00851.2 20 2 F00851.1
# 3: F00851 20 2 F00851.1
# 4: F00851.8 20 2 F00851.1
# 5: F00851.4 20 2 F00851.1
# 6: F00851.5 20 2 F00851.1
# 7: F00851.1 20 2 F00851.1
# 8: F00851.6 21 2 F00851.2
# 9: F00851.7 21 2 F00851.2
#10: F00958.2 23 1 F00958.1
#11: F00958.1 23 1 F00958.1
#12: F00958.3 23 1 F00958.1
#13: F00958 23 1 F00958.1
#14: F01404.5 28 3 F01404.1
#15: F01404 28 3 F01404.1
#16: F01404.4 28 3 F01404.1
#17: F01404.3 28 3 F01404.1
#18: F01404.6 29 3 F01404.2
#19: F01404.1 29 3 F01404.2
#20: F01404.7 29 3 F01404.2
#21: F01404.2 30 3 F01404.3
#22: F01404.8 30 3 F01404.3
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句