所以我有两列看起来像这样:
V1 V2
ENSP00000222573_N559D ENSG00000105855
ENSP00000222573_N559D ENSG00000105855
ENSP00000267853_E337* ENSG00000108239
ENSP00000299441_R1672P,R1672G ENSG00000127415
ENSP00000334642_K277N. ENSG00000134324
ENSP00000342952_N585R ENSG00000134324
首先,我需要第一列来提取_之后的所有字母/符号,因此结果应如下所示:
V1 V2
ND ENSG00000105855
ND ENSG00000105855
E* ENSG00000108239
RP,RG ENSG00000127415
KN ENSG00000134324
NR ENSG00000134324
然后,我想进行过滤,以便仅当V1和V2都加倍时,它们才会被过滤掉。因此最终结果将是:
V1 V2
ND ENSG00000105855
E* ENSG00000108239
RP,RG ENSG00000127415
KN ENSG00000134324
NR ENSG00000134324
选项可以使用sapply
和strsplit
作为:
sapply(df, function(x){
sapply(strsplit(x, split = "_"), function(y){
if(length(y)<2){
y
} else {
gsub("[0-9]+","",y[2])
}
})
}) %>% as.data.frame() %>% distinct()
# V1 V2
# 1 ND ENSG00000105855
# 2 E* ENSG00000108239
# 3 RP,RG ENSG00000127415
# 4 KN. ENSG00000134324
# 5 NR ENSG00000134324
数据:
df <- read.table(text =
"V1 V2
ENSP00000222573_N559D ENSG00000105855
ENSP00000222573_N559D ENSG00000105855
ENSP00000267853_E337* ENSG00000108239
ENSP00000299441_R1672P,R1672G ENSG00000127415
ENSP00000334642_K277N. ENSG00000134324
ENSP00000342952_N585R ENSG00000134324",
stringsAsFactors = FALSE, header = TRUE)
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句