选择第一行并在R的数据框中的group_by中聚合

丹尼尔·G

我有以下数据

 df <-  tibble::tribble(
      ~V1,          ~V2,              ~V3,      ~V4,       ~V5,
    "CTV10016020", "PoP", "2020-06-08 01:50:07", 220L,   "Music",
    "CTV10016020", "PoP", "2020-06-08 01:53:45",   8L,    "Music",
    "CTV10016020", "PoP", "2020-06-08 01:53:53", 133L,   "Music",
    "CTV10016020", "PoP", "2020-06-08 01:56:05", 234L,   "Music",
    "CTV10016020", "PoP", "2020-06-08 01:59:57",   0L, "Control",
    "CTVM11011420", "Game", "2020-06-08 02:03:00",   0L, "Control",
    "CTVM11011420", "Game", "2020-06-08 02:03:00",  10L,    "Music",
    "CTVM11011420", "Game", "2020-06-08 02:03:07", 116L,   "Music",
    "CTVM11011420", "Game", "2020-06-08 02:05:01",  32L,   "Audio",
    "CTVM11011420", "Game", "2020-06-08 02:05:32", 208L,   "Music",
    "CTVM11011420", "Game", "2020-06-08 02:08:36",  42L,   "Audio"
    )

我想group_by V1和V2，保留第一个V3记录并计算V4的总和。

样本数据的预期输出：

   V1           V2    V3                   total               
   <chr>        <chr> <dttm>              <int>             
 1 CTV10016020   PoP   2020-06-08 01:50:07   595 
 2 CTVM11011420  Game  2020-06-08 02:03:00   408

我的尝试：我尝试过，dplyr::first但是我认为我以错误的方式使用它。

 df %>% 
   mutate(V3= as.POSIXct(V3, "%Y-%m-%d %H:%M:%OS", tz = "Europe/Helsinki")) %>% 
   group_by(V1, V2) %>% 
   dplyr::mutate(
     first = dplyr::first(V3)) %>%
   summarize(total_duration = sum(V4))

阿克伦

如果我们删除的mutate后面的步骤并在其中group_by使用该步骤，则OP的方法应该可以正常工作，summarise因为在之后summarize，我们获得了summarise与所有分组列一起使用的唯一列，即mutate获得的列first(V3)未进入输出

library(dplyr)
 df %>% 
    mutate(V3= as.POSIXct(V3, "%Y-%m-%d %H:%M:%OS", tz = "Europe/Helsinki")) %>%
    group_by(V1, V2) %>%
    summarise(V3 = first(V3), total = sum(V4))
# A tibble: 2 x 4
# Groups:   V1 [2]
#  V1           V2    V3                  total
#  <chr>        <chr> <chr>               <int>
#1 CTV10016020  PoP   2020-06-08 01:50:07   595
#2 CTVM11011420 Game  2020-06-08 02:03:00   408

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。