R：> 2列的data.frame行的聚合（中位数）

Mal_a 发表于 Dev

Mal_a

我想聚合我的data.frame。

这是示例数据：

data <- structure(list(Charge = c(210133L, 210133L, 210133L, 210152L, 
                                  210152L, 210152L, 210152L, 210180L, 210180L, 210180L), Seq = c(1L, 
                                                                                                       2L, 3L, 1L, 2L, 3L, 4L, 1L, 2L, 2L), x = c(NA, 1.5, 2, 
                                                                                                                                                         1.5, 1, 0.67, 1.17, 1, 1, 1), y = c(0.5, 0.5, 1, NA, 0.5, 
                                                                                                                                                                                                    0.5, 0.5, 0.5, 0.5, 0.5)), .Names = c("Charge", "Seq", 
                                                                                                                                                                                                                                          "x", "y"), row.names = c(NA, 10L), class = "data.frame")

*为便于解释（与上述相同，格式不同）：

   Charge Seq    x   y
1  210133   1   NA 0.5
2  210133   2 1.50 0.5
3  210133   3 2.00 1.0
4  210152   1 1.50  NA
5  210152   2 1.00 0.5
6  210152   3 0.67 0.5
7  210152   4 1.17 0.5
8  210180   1 1.00 0.5
9  210180   2 1.00 0.5
10 210180   2 1.00 0.5

对于每个唯一电荷，当Seq> 1时，必须执行x和y列行的中位数。

因此，例如对于此样本数据，我想获得的是seq> 1的x和y行中位数的其他行：

       Charge Seq    x   y
    1  210133   1   NA 0.5
    2  210133   2 1.50 0.5
    3  210133   3 2.00 1.0
    4  210133   >1 1.75 0.75 #here is additional row with median of x and y
    4  210152   1 1.50  NA
    5  210152   2 1.00 0.5...

感谢帮助！

阿克伦

我们可以使用data.table。将'data.frame'转换为'data.table'（setDT(data)），按“ Charge”分组，遍历列（lapply(.SD,...），根据'i'（）中的条件获取median指定的列中.SDcols的Seq >1，创建一个' Seq'列，其值“> 1”。将原始数据与新数据一起放在中list，用于rbind组合数据集，order必要时使用。

library(data.table)
setDT(data)
res <- data[Seq > 1L, lapply(.SD, median, na.rm=TRUE), 
            by = Charge, .SDcols = x:y][, Seq := ">1"][]
ans <- setorder(rbind(data, res), Charge, Seq)
#    Charge Seq    x    y
# 1: 210133   1   NA 0.50
# 2: 210133   2 1.50 0.50
# 3: 210133   3 2.00 1.00
# 4: 210133  >1 1.75 0.75
# 5: 210152   1 1.50   NA
# 6: 210152   2 1.00 0.50
# 7: 210152   3 0.67 0.50
# 8: 210152   4 1.17 0.50
# 9: 210152  >1 1.00 0.50
#10: 210180   1 1.00 0.50
#11: 210180   2 1.00 0.50
#12: 210180   2 1.00 0.50
#13: 210180  >1 1.00 0.50

使用类似的选项dplyr将原始数据集中class的“ Seq”转换为character。那么，filter对于“序列”不等于1，通过“充电”分组，我们得到的median列有summarise_each，请在输出“序列”新列，然后将其绑定使用与新的原始数据bind_rows，并order在必要时。

library(magrittr)
library(dplyr)
data %<>%
     mutate(Seq = as.character(Seq))

data %>% 
   filter(Seq!="1") %>%
   group_by(Charge) %>% 
   summarise_each(funs(median=median(., na.rm=TRUE)), x:y) %>%
   mutate(Seq = ">1") %>% 
   bind_rows(data, .) %>% 
   mutate(Seq = factor(Seq, levels = c(unique(data$Seq), ">1"))) %>% 
   arrange(Charge, Seq)

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-04-15

我来说两句

0 条评论

登录后参与评论

上一篇：如何隐藏和显示div onclick链接

TOP 榜单

文章

R：> 2列的data.frame行的聚合（中位数）

R：> 2列的data.frame行的聚合（中位数）

Linux的官方Adobe Flash存储库是否已过时？

用日期数据透视表和日期顺序查询

应用发明者仅从列表中选择一个随机项一次

Java Eclipse中的错误13，如何解决？

在Windows 7中无法删除文件（2）

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

套接字无法检测到断开连接

带有错误“ where”条件的查询如何返回结果？

有什么解决方案可以将android设备用作Cast Receiver？

Mac OS X更新后的GRUB 2问题

ggplot：对齐多个分面图-所有大小不同的分面

验证REST API参数

如何从视图一次更新多行（ASP.NET - Core）

尝试反复更改屏幕上按钮的位置 - kotlin android studio

计算数据帧中每行的NA

检索角度选择div的当前值

离子动态工具栏背景色

UITableView的项目向下滚动后更改颜色，然后快速备份

VB.net将2条特定行导出到DataGridView

蓝屏死机没有修复解决方案

通过 Git 在运行 Jenkins 作业时获取 ClassNotFoundException