根据另一个定制顺序的列对数据框中的列对进行排序

用户名

我已经在网上搜索过，但是找不到解决我问题的方法。我有如下数据

df <- structure(list(V1 = c("ATP1A2", "CAPRIN1", "ATP1A1", "CBX3", 
"AUP1", "LARS2", "MTHFD1", "VDAC2", "PRKCSH", "ATP1B1", "B3GNT3", 
"", ""), V2 = c("ATP1A1", "ATP1A2", "ATP1B1", "AUP1", "B3GNT3", 
"CAPRIN1", "CAPRIN1", "CBX3", "", "", "", "", ""), V3 = c("220948_s_at", 
"203296_s_at", "201243_s_at", "220525_s_at", "204856_at", "200722_s_at", 
"200723_s_at", "200037_s_at", "", "", "", "", ""), V4 = c("LARS2", 
"MTHFD1", "PRKCSH", "PRKCSH", "VDAC2", "", "", "", "", "", "", 
"", ""), V5 = c("204016_at", "202309_at", "200707_at", "214080_x_at", 
"211662_s_at", "", "", "", "", "", "", "", "")), .Names = c("V1", 
"V2", "V3", "V4", "V5"), row.names = c(NA, -13L), class = "data.frame")

我想做的是根据第一列对第2列和第4列进行排序。但是，如果我对第二列进行排序，则第三列将根据V2进行更改，第五列将根据V4进行更改。

预期输出如下

       V1      V2          V3          V4          V5
1   ATP1A2    ATP1A2   203296_s_at      -           -
2  CAPRIN1    CAPRIN1  200722_s_at      -           - 
3      -      CAPRIN1  200723_s_at      -           - 
4   ATP1A1    ATP1A1   220948_s_at      -           -
5    CBX3      CBX3    200037_s_at      -           -
6     AUP1     AUP1    220525_s_at      -           - 
7    LARS2      -          -          LARS2    204016_at
8   MTHFD1      -           -         MTHFD1   202309_at
9    VDAC2      -           -         VDAC2    211662_s_at 
10   PRKCSH      -           -         PRKCSH   200707_at   
11      -         -           -         PRKCSH   214080_x_at                
12  ATP1B1     ATP1B1  201243_s_at        -        -                             
13  B3GNT3    B3GNT3    204856_at         -        -

我想基于V1对V2和V4进行排序。注意，V3对应于V2，V5对应于V4。例如，如果V2更改，则V3更改，依此类推。

我确实尝试了以下方法，但对我没有帮助

df2<- df[with(df, order(V1)), ]

我也尝试了以下方法，但是没有用

require(data.table)
df2 <- data.table(df, key="V1")

以下也没有用

df2<- df[order(df$V1),]

康巴布

我同意zx8754，您想要做的是在（V2，V3）和（V4，V5）的V1上进行合并（在sql中合并），如果您的数据最初是在单独的数据库中，其格式为：

    df$RowNum <- 1:nrow(df) # row numbers to sort at the end

1）使用data.table，如果您有巨大的表，则可能有用

    library("data.table")
    dt <- as.data.table(df)

    # your data are essentially three different tables
    # so let's split it up, removing useless empty cells
    ref <- dt[which(dt$V1!=""),c("V1","RowNum"),with=FALSE]
          # with=FALSE necessary for the second argument to 
          #    be understood as column names in a data table
    setkey(ref,"V1") # the column used for the merges

    tab1 <- dt[which(dt$V2!=""),c("V2","V3"),with=FALSE]
    setkey(tab1,"V2")

    tab2 <- dt[which(dt$V4!=""),c("V4","V5"),with=FALSE]
    setkey(tab2,"V4")

    # merge tab1 to ref and tab2 to the product
    # using data.table formalism
    df3 <- tab2[tab1[ref,allow.cartesian=T],allow.cartesian=T]
             # allow.cartesio=T important to keep all in ref

    # and to get exactly the same output
    setkey(df3,"RowNum") # order df3 by RowNum
    df3 <- within(df3,{
                  V2 <- V1 <- V4 # make the columns V1 and V2
                  V4[is.na(V5)]<-NA # put back NA
                  V2[is.na(V3)]<-NA
    })
    setcolorder(df3,sort(names(df3))) # sort V1 to V5

2）使用与上述相同的基数R＃，将df拆分为三个表ref <-df [which（df $ V1！=“”），c（“ V1”，“ RowNum”）] tab1 <-df [which（ df $ V2！=“ =”“，c（” V2“，” V3“）] tab2 <-df [which（df $ V4！=”“），c（” V4“，” V5“）]

然后，您合并：

    df2 <- merge(ref,tab1,by.x="V1",by.y="V2",all=TRUE)
    df2 <- merge(df2,tab2,by.x="V1",by.y="V4",all=TRUE)

之后，如果您想要完全相同的输出：

    df2 <- df2[order(df2$RowNum),] # order by RowNum
    # make the V2 column
    df2$V2 <- df2$V1
    df2$V2[which(is.na(df2$V3))] <- NA

    # make the V4 column
    df2$V4 <- df2$V1
    df2$V4[which(is.na(df2$V5))] <- NA

    # order the columns as wanted
    df2 <- df2[,c("V1","V2","V3","V4","V5")]

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-03-27

我来说两句

0 条评论

登录后参与评论

上一篇：JavaScript输入栏不起作用

根据另一个定制顺序的列对数据框中的列对进行排序

根据另一个定制顺序的列对数据框中的列对进行排序

UITableView的项目向下滚动后更改颜色，然后快速备份

Linux的官方Adobe Flash存储库是否已过时？

用日期数据透视表和日期顺序查询

应用发明者仅从列表中选择一个随机项一次

Mac OS X更新后的GRUB 2问题

验证REST API参数

Java Eclipse中的错误13，如何解决？

带有错误“ where”条件的查询如何返回结果？

ggplot：对齐多个分面图-所有大小不同的分面

尝试反复更改屏幕上按钮的位置 - kotlin android studio

如何从视图一次更新多行（ASP.NET - Core）

计算数据帧中每行的NA

蓝屏死机没有修复解决方案

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

离子动态工具栏背景色

VB.net将2条特定行导出到DataGridView

通过 Git 在运行 Jenkins 作业时获取 ClassNotFoundException

在Windows 7中无法删除文件（2）

python中的boto3文件上传

当我尝试下载 StanfordNLP en 模型时，出现错误

Node.js中未捕获的异常错误，发生调用