我有以下数据
df <- tibble::tribble(
~V1, ~V2, ~V3, ~V4, ~V5,
"CTV10016020", "PoP", "2020-06-08 01:50:07", 220L, "Music",
"CTV10016020", "PoP", "2020-06-08 01:53:45", 8L, "Music",
"CTV10016020", "PoP", "2020-06-08 01:53:53", 133L, "Music",
"CTV10016020", "PoP", "2020-06-08 01:56:05", 234L, "Music",
"CTV10016020", "PoP", "2020-06-08 01:59:57", 0L, "Control",
"CTVM11011420", "Game", "2020-06-08 02:03:00", 0L, "Control",
"CTVM11011420", "Game", "2020-06-08 02:03:00", 10L, "Music",
"CTVM11011420", "Game", "2020-06-08 02:03:07", 116L, "Music",
"CTVM11011420", "Game", "2020-06-08 02:05:01", 32L, "Audio",
"CTVM11011420", "Game", "2020-06-08 02:05:32", 208L, "Music",
"CTVM11011420", "Game", "2020-06-08 02:08:36", 42L, "Audio"
)
我想group_by V1和V2,保留第一个V3记录并计算V4的总和。
样本数据的预期输出:
V1 V2 V3 total
<chr> <chr> <dttm> <int>
1 CTV10016020 PoP 2020-06-08 01:50:07 595
2 CTVM11011420 Game 2020-06-08 02:03:00 408
我的尝试:我尝试过,dplyr::first
但是我认为我以错误的方式使用它。
df %>%
mutate(V3= as.POSIXct(V3, "%Y-%m-%d %H:%M:%OS", tz = "Europe/Helsinki")) %>%
group_by(V1, V2) %>%
dplyr::mutate(
first = dplyr::first(V3)) %>%
summarize(total_duration = sum(V4))
如果我们删除的mutate
后面的步骤并在其中group_by
使用该步骤,则OP的方法应该可以正常工作,summarise
因为在之后summarize
,我们获得了summarise
与所有分组列一起使用的唯一列,即mutate
获得的列first(V3)
未进入输出
library(dplyr)
df %>%
mutate(V3= as.POSIXct(V3, "%Y-%m-%d %H:%M:%OS", tz = "Europe/Helsinki")) %>%
group_by(V1, V2) %>%
summarise(V3 = first(V3), total = sum(V4))
# A tibble: 2 x 4
# Groups: V1 [2]
# V1 V2 V3 total
# <chr> <chr> <chr> <int>
#1 CTV10016020 PoP 2020-06-08 01:50:07 595
#2 CTVM11011420 Game 2020-06-08 02:03:00 408
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句