我现在有一个数据集,现在想在12:00 pm(正午)分成两个数据集,即,如果变量从08:00-13:00变为08:00-12:00和12:00-13:两排为00。变量持续时间和累计和需要相应地更改,但其他变量应与原始变量相同(不变)。
这应该适用于不同的id变量。
id = unchanged from row 1, just repeated
start = changed in both rows
end = changed in both rows
day = unchanged from row 1, just repeated
duration = changed in both rows
cumulative time = changed in both row
ORIGINAL DATAFILE
#Current dataframe
id<-c("m1","m1")
x<-c("2020-01-03 10:00:00","2020-01-03 19:20:00")
start<-strptime(x,"%Y-%m-%d %H:%M:%S")
y<-c("2020-01-03 16:00:00","2020-01-03 20:50:00")
end<-strptime(y,"%Y-%m-%d %H:%M:%S")
day<-c(1,1)
mydf<-data.frame(id,start,end,day)
# calculate duration and time
mydf$duration<-as.numeric(difftime(mydf$end,mydf$start,units = "hours"))
mydf$time<-c(cumsum(mydf$duration))
REQUIRED DATAFILE
#Required dataframe
id2<-c("m1","m1","m1")
x2<-c("2020-01-03 10:00:00","2020-01-03 12:00:00","2020-01-03 19:20:00")
start2<-strptime(x2,"%Y-%m-%d %H:%M:%S")
y2<-c("2020-01-03 12:00:00","2020-01-03 16:00:00","2020-01-03 20:50:00")
end2<-strptime(y2,"%Y-%m-%d %H:%M:%S")
day2<-c(1,1,1)
mydf2<-data.frame(id2,start2,end2,day2)
# calculate duration and time
mydf2$duration<-c(2,4,1.5)
mydf2$time<-c(2,6,7.5)
好问题。因此,每行隐式包含一个或两个间隔,因此您应该能够仅在每行上定义这些间隔,然后将其旋转为long,但不能使用间隔值进行旋转(还可以吗?)。因此,这是我的方法,该方法最多为每条线计算两个班次的开始时间,然后从旋转后的下一个班次的开始推断班次的结束。内联评论。
library(lubridate, warn.conflicts = FALSE)
library(tidyverse)
library(magrittr, warn.conflicts = FALSE)
library(hablar, warn.conflicts = FALSE)
(mydf <- tibble(
id = "m1",
start = as_datetime(c("2020-01-03 10:00:00", "2020-01-03 19:20:00")),
end = as_datetime(c("2020-01-03 16:00:00", "2020-01-03 20:50:00")),
day = 1
))
#> # A tibble: 2 x 4
#> id start end day
#> <chr> <dttm> <dttm> <dbl>
#> 1 m1 2020-01-03 10:00:00 2020-01-03 16:00:00 1
#> 2 m1 2020-01-03 19:20:00 2020-01-03 20:50:00 1
(mydf2 <-
mydf %>%
# Assume the relevant noontime cutoff is on the same day as the start
mutate(midday =
start %>% as_date() %>%
add(12 %>% hours()) %>%
fit_to_timeline() %>%
# No relevant midday if the shift doesn't include noon
na_if(not(. %within% interval(start, end)))) %>%
# Make an original row ID since there doesn't seem to be one, and we will need
# to build intervals within the data stemming from each original row
rownames_to_column("orig_shift") %>%
pivot_longer(cols = c(start, midday, end),
# The timestamps we have here will be treated as start times
values_to = "start",
# Drop rows that would exist due to irrelevant middays
values_drop_na = TRUE) %>%
select(-name) %>%
# Infer shift end times as the start of the next shift, within lines defined
# by the original shifts
group_by(orig_shift) %>%
arrange(start) %>%
mutate(end = lead(start)) %>%
ungroup() %>%
# Drop lines that represent the end of the last shift and not a full one
drop_na() %>%
# Compute those durations and times (should times really be globally
# cumulative? Also, your specified mydf2 seems to have an incorrect first time
# value)
mutate(duration = start %--% end %>% as.numeric("hours"),
time = cumsum(duration)) %>%
select(id, start, end, day, duration, time))
#> # A tibble: 3 x 6
#> id start end day duration time
#> <chr> <dttm> <dttm> <dbl> <dbl> <dbl>
#> 1 m1 2020-01-03 10:00:00 2020-01-03 12:00:00 1 2 2
#> 2 m1 2020-01-03 12:00:00 2020-01-03 16:00:00 1 4 6
#> 3 m1 2020-01-03 19:20:00 2020-01-03 20:50:00 1 1.5 7.5
由reprex软件包(v0.3.0)创建于2019-10-23
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句