我有以下数字0和1的矩阵,每列始终包含相同数量的字符串。一列中的最小字符串数为2。当它们同时满足这两个条件时,我想删除它们。
10
和01
),01
只发生一两次。但是我想保留所有其他列: r1 <- c("10","001","0001","01","100","10")
r2 <- c("01","001","0001","10","100","10")
r3 <- c("10","100","1000","10","010","01")
r4 <- c("10","010","0100","10","001","10")
r5<- c("01","010","0010","10","001","10")
r6<- c("01","010","0010","10","001","01")
n.mat <- rbind(r1,r2,r3,r4,r5,r6)
输出:
r1 <- c("10","001","0001","100")
r2 <- c("01","001","0001","100")
r3 <- c("10","100","1000","010")
r4 <- c("10","010","0100","001")
r5<- c("01","010","0010","001")
r6<- c("01","010","0010","001")
n.mat <- rbind(r1,r2,r3,r4,r5,r6)
卸下第4列和第6列。
到目前为止,我的代码是:
del_two<- function(x){
length(unique(x)) != 2
}
msa_protein.mat_1<-msa_protein.mat[, apply(msa_protein.mat, 2, del_two)]
但是我不确定如何添加if函数。
您可以添加&
将逻辑选择与“ AND”逻辑结合在一起。尽管在这种情况下,我认为您想删除这些值而不是保留它们,所以您需要取消!
最终选择:
n.mat[, apply(n.mat, 2, FUN=function(x) !(length(unique(x)) == 2 & sum(x == '01') <= 2))]
甚至:
n.mat[, !apply(n.mat, 2, FUN=function(x) length(unique(x)) == 2 & sum(x == '01') <= 2)]
您也可以将其表示为逻辑条件失败,再加上|
“ OR”逻辑:
n.mat[, apply(n.mat, 2, FUN=function(x) length(unique(x)) != 2 | sum(x == '01') > 2)]
全部给予:
# [,1] [,2] [,3] [,4]
#r1 "10" "001" "0001" "100"
#r2 "01" "001" "0001" "100"
#r3 "10" "100" "1000" "010"
#r4 "10" "010" "0100" "001"
#r5 "01" "010" "0010" "001"
#r6 "01" "010" "0010" "001"
也可能有一些棘手的方法使用列总和来完成此操作,如果您拥有大量数据,则可能会更快,例如:
n.mat[, !(
(colSums(n.mat == "01") <= 2) &
colSums(matrix(n.mat %in% c("10","01"), nrow=nrow(n.mat), ncol=ncol(n.mat))) == nrow(n.mat)
)]
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句