目的:
我有一个数据集df,我想按ID分组并根据某些条件找到持续时间:Focus == True,Read == True和ID!=“”
ID Date Focus Read
A 1/2/2020 5:00:00 AM True True
A 1/2/2020 5:00:05 AM True True
1/3/2020 6:00:00 AM True
1/3/2020 6:00:05 AM True
B 1/4/2020 7:00:00 AM True True
B 1/4/2020 7:00:02 AM True True
B 1/4/2020 7:00:10 AM True True
我想要这个输出:
ID Duration
A 5 sec
B 10 sec
dput:
structure(list(ID = structure(c(2L, 2L, 1L, 1L, 3L, 3L, 3L), .Label = c("",
"A", "B"), class = "factor"), Date = structure(1:7, .Label = c("1/2/2020 5:00:00 AM",
"1/2/2020 5:00:05 AM", "1/3/2020 6:00:00 AM", "1/3/2020 6:00:05 AM",
"1/4/2020 7:00:00 AM", "1/4/2020 7:00:02 AM", "1/4/2020 7:00:10 AM"
), class = "factor"), Focus = structure(c(1L, 1L, 1L, 1L, 1L,
1L, 1L), .Label = "True ", class = "factor"), Read = structure(c(2L,
2L, 1L, 1L, 2L, 2L, 2L), .Label = c("", "True "), class = "factor")), class = "data.frame", row.names = c(NA,
-7L))
我试过的
df %>% group_by(ID)
mutate(Date = lubridate::mdy_hms(Date),
cond = Focus == "TRUE" & Read=="TRUE" & ID != "" ,
grp = cumsum(!cond)) %>%
filter(cond) %>%
group_by(grp) %>%
summarise(starttime = first(Date),
endtime = last(Date),
duration = difftime(endtime, starttime, units = "secs")) %>%
select(-grp)
但是,这不是按ID分组的,因为我在输出中看不到这一点。
任何建议表示赞赏。
我们可以filter
根据“读取”中的“真实”值执行第一个操作,将“日期”转换为“日期时间”类,并按“ ID”分组,获得“持续时间”,即“first
和last
”之间的差值(以秒为单位)日期'
library(dplyr)
library(lubridate)
df %>%
filter(as.logical(trimws(Read)), as.logical(trimws(Focus))) %>%
mutate(Date = mdy_hms(Date)) %>%
group_by(ID) %>%
summarise(Duration = difftime(last(Date), first(Date), units = "secs"))
# A tibble: 2 x 2
# ID Duration
# <fct> <drtn>
#1 A 5 secs
#2 B 10 secs
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句