如何按R数据帧中的连续行分组？

弗雷德·约翰逊

我在时间序列数据中有一个列TimeStamp，Type，Value的数据框。类型指的是峰还是谷。我想要：

按连续类型对所有数据进行分组对于“峰值”类型的组，我想选择最高的值对于“山谷”类型的组，我想选择最低的值。按这些最高/最低值过滤数据框期望：我会有一个数据框交替出现在最高峰和最低谷之间排。

我知道如何执行此操作的唯一方法是使用for循环，然后将连续值添加到向量中，然后获取最大值，然后将其推到新的数据帧中，依此类推。

对于那些了解python的人，这就是我所做的（尽管我需要将我的代码传输到R）：

segmentation['min_v'] = segmentation.groupby( segmentation.pv_type.ne(segmentation.pv_type.shift()).cumsum() ).price.transform(min)
segmentation['max_p'] = segmentation.groupby( segmentation.segmentation.pv_type.ne(segmentation.pv_type.shift()).cumsum() ).price.transform(max)

编辑

样本数据集：

types <- c('peak', 'peak', 'valley', 'peak', 'valley', 'valley', 'valley')
values <- c(1.01,   1.00,    0.4,     1.2,     0.3,      0.1,      0.2)
segmentation <- data.frame(types, values)
segmentation

expectedTypes <- c('peak', 'valley', 'peak', 'valley')
expectedValues <- c(1.00, 0.4, 1.2, 0.1 )
expectedResult <- data.frame(expectedTypes, expectedValues)
expectedResult

我不知道一种更好的方法来生成数据。

阿克伦

使用时R，使用的一种实现dplyr方式是将'pv_type'和'pv_type'之间的逻辑比较的累积和lag作为分组列，然后将'price'min和max'price'作为两个新列

library(dplyr)
segmentation %>%
       group_by(pv_type_group = cumsum(pv_type != lag(pv_type,
                 default = first(pv_type))) %>%
       mutate(min_v = min(price), max_p = max(price))

更新资料

在OP的示例中，预期输出为summarised，因此我们使用summarise代替mutate。另外，使用rleid（from data.table）代替逻辑累计和

library(data.table)
segmentation %>% 
    group_by(grp = rleid(types)) %>% 
    summarise(types = first(types), expectedvalues = min(values)) %>%
    ungroup %>%
    select(-grp)
# A tibble: 4 x 2
#  types  expectedvalues
# <fct>           <dbl>
#1 peak              1  
#2 valley            0.4
#3 peak              1.2
#4 valley            0.1

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-01-16

我来说两句

0 条评论

登录后参与评论

上一篇：Laravel Scheduler在Dreamhost VPS中不起作用