如何通过R中的三个变量对数据框进行排序和计数？

山医生

我有一个数据框 dftime，有很多变量，但数据的快照如下所示：

| gene  | country | case_month | case_year |
| ----- | ------- | ---------- | --------- |
| gene1 | Senegal | February   | 2020      |
| gene2 | Botswana| January    | 2021      |
| gene3 | Congo   | March      | 2021      |
| gene4 | Guinea  | September  | 2020      |

这是可重现的东西：

structure(list(gene = c("gene1", "gene2", 
"gene3", "gene4", "gene5", 
"gene6"), date = structure(c(18319, 18328, 
18320, 18323, 18325, 18324), class = "Date"), country = c("Nigeria", 
"South Africa", "Senegal", "Senegal", "Senegal", "Senegal"), 
    case_month = c("February", "March", "February", "March", 
    "March", "March"), case_year = c("2020", "2020", "2020", 
    "2020", "2020", "2020")), row.names = c(1L, 3L, 22L, 23L, 
24L, 25L), class = "data.frame")

我留在日期变量中以防万一它有帮助！我从日期中取出 case_month 和 case_year。

总共有 38 个国家，所有 12 个月都有代表，只有两年是 2020 年和 2021 年。我正在尝试对这些数据进行排序，以便我可以得到 2020 年 1 月塞内加尔，2020 年 2 月塞内加尔的基因数量，等等，这样我就可以得到每个国家在两年中每个月的所有基因的计数 n。我希望有这样的输出：

| country | case_month | case_year | n |
| ------- | ---------- | --------- |---|
| Senegal | January    | 2020      | 4 |
| Senegal | February   | 2020      | 6 |
| Senegal | March      | 2020      | 5 |
| Botswana| January    | 2021      | 1 |
| Congo   | March      | 2021      | 2 |

等等...

目标是我可以使用这个计数来生成这样的堆叠条形图，n 是计数的新变量：

dftime_stacked <- ggplot(dftime_ord, aes(fill=country, y= n, x=case_month)) + 
  geom_bar(position="stack", stat="identity")

dftime_stacked + facet_wrap(~ case_year)

我尝试使用 dplyr 对数据进行排序，使用 mutate：

dftime_ord <- mutate(dftime, country = reorder(country, -n, sum),
                     case_month = reorder(case_month, -n, sum))

然而，这会引发两个错误——第一个错误是 -n，它说：

Error in -n : invalid argument to unary operator

第二个当我把它拿出来时，因为在这种情况下按最大到最小排序并不是最重要的，因为我的国家无论如何都是按字母顺序排列的：

Error in tapply(X = X, INDEX = x, FUN = FUN, ...) : 
  arguments must have same length

我所有的变量都是字符。是否有原因无法在 dplyr 中以这种方式对它们进行排序？任何想法为什么会像这样抛出错误？非常感谢所有的帮助！

沙美沙克尔

您可以通过data.table解决方案操纵订单；

df <- read.table(textConnection(' gene  | country | case_month | case_year 
 gene1 | Senegal | February   | 2020      
 gene2 | Botswana| January    | 2021      
 gene3 | Congo   | March      | 2021      
 gene4 | Guinea  | September  | 2020      '),sep='|',header=T)

library(data.table)

setDT(df)

df <- df[,.(n=.N),by=c('country','case_year','case_month')]

setorderv(df,c('country','case_month'),c(-1,-1))

输出;

  country     case_year case_month         n
  <fct>           <dbl> <fct>          <int>
1 " Senegal "      2020 " February   "     1
2 " Guinea  "      2020 " September  "     1
3 " Congo   "      2021 " March      "     1
4 " Botswana"      2021 " January    "     1

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-09-13

我来说两句

0 条评论

登录后参与评论

上一篇：dotenv process.env 变量在全局安装的自定义 CLI 工具中未定义

如何通过R中的三个变量对数据框进行排序和计数？

如何通过R中的三个变量对数据框进行排序和计数？

Android Studio Kotlin：提取为常量

IE 11中的FormData未定义

计算数据帧R中的字符串频率

如何在R中转置数据

如何使用Redux-Toolkit重置Redux Store

Excel 2016图表将增长与4个参数进行比较

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

未捕获的SyntaxError：带有Ajax帖子的意外令牌u

OpenCv：改变 putText() 的位置

ActiveModelSerializer仅显示关联的ID

算术中的c ++常量类型转换

如何开始为Ubuntu开发

将加号/减号添加到jQuery菜单

去噪自动编码器和常规自动编码器有什么区别？

获取并汇总所有关联的数据

OpenGL纹理格式的颜色错误

在 React Native Expo 中使用 react-redux 更改另一个键的值

http：// localhost：3000 /＃！/为什么我在localhost链接中得到“＃！/”。

TreeMap中的自定义排序

Redux动作正常，但减速器无效

如何对treeView的子节点进行排序