在现有的 csv 中,有一列包含以下代码之一,即每一行的外膜 [GO:0019867]。我想向 csv 添加一列,该列将为每一行提供一个类别,即 OuterMembrane。所以我添加了一个空列,我想制作这个列表,以便在代码被引用到 csv 时自动添加通用类别。(并非所有编码都包括在内)
categ <- list(OuterMembrane = c("outer membrane [GO:0019867]","cell outer membrane [GO:0009279]", "integral component of membrane [GO:0016021]", "membrane [GO:0016020]"),
Cytoplasmic =c("ribosome [GO:0005840]", "cytoplasm [GO:0005737]"),
Extracellular=c(),
InnerMembrane=c("plasma membrane [GO:0005886]", "membrane [GO:0016020]"),
Periplasmic=c("periplasmic space [GO:0042597]"),
CellWall=c(),
Vacuole=c(),
Lipoproteins=c())
csv1 <- csv1%>%
add_column("Subcellular Localization" = NA)
for (row in (categ)){
if row(categ) %in% csv1{
………………??????
以下内容for loop
可能对您的问题有所帮助。
csv1['subcellular_localization'] <- NA #add a new column
for (i in 1:nrow(csv1)) { #fill in the new column
for (j in 1:length(categ)) {
if (csv1$cell_comp[i] %in% categ[[j]]) {
csv1$subcellular_localization[i] <- names(categ[j])
}
}
}
csv1
输入:
> csv1
name cell_comp
1 p1 outer membrane [GO:0019867]
2 p2 cytoplasm [GO:0005737]
3 p3 periplasmic space [GO:0042597]
输出:
> csv1
name cell_comp subcellular_localization
1 p1 outer membrane [GO:0019867] OuterMembrane
2 p2 cytoplasm [GO:0005737] Cytoplasmic
3 p3 periplasmic space [GO:0042597] Periplasmic
编辑
如果每个蛋白质有多个细胞成分,可以使用以下形式的 for 循环(使用stringr
库):
library(stringr)
for (i in 1:nrow(csv1)) {
components <- unlist(strsplit(csv1$cell_comp[i], ';'))
for (component in components) {
component <- str_trim(component, side='left')
for (j in 1:length(categ)) {
if (component %in% categ[[j]]) {
if (is.na(csv1$subcellular_localization[i])) {
csv1$subcellular_localization[i] <- names(categ[j])
} else {
if (csv1$subcellular_localization[i] != names(categ[j])) {
csv1$subcellular_localization[i] <- paste(csv1$subcellular_localization[i],names(categ[j]), sep="; ")
} else {
csv1$subcellular_localization[i] <- names(categ[j])
}
}
}
}
}
}
输入*:
> csv1
name cell_comp
1 p1 outer membrane [GO:0019867]; integral component of membrane [GO:0016021]
2 p2 cytoplasm [GO:0005737]; periplasmic space [GO:0042597]
3 p3 periplasmic space [GO:0042597]
输出*:
> csv1
name cell_comp subcellular_localization
1 p1 outer membrane [GO:0019867]; integral component of membrane [GO:0016021] OuterMembrane
2 p2 cytoplasm [GO:0005737]; periplasmic space [GO:0042597] Cytoplasmic; Periplasmic
3 p3 periplasmic space [GO:0042597] Periplasmic
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句