我有一个直接的时间序列序列,例如:
library(lubridate)
start = parse_date_time("2018-01-01","%Y-%m-%d")
end = parse_date_time("2018-01-02","%Y-%m-%d")
series = seq(start,end,by=600)
> series
[1] "2018-01-01 00:00:00 UTC" "2018-01-01 00:10:00 UTC" "2018-01-01 00:20:00 UTC" "2018-01-01 00:30:00 UTC"
[5] "2018-01-01 00:40:00 UTC" "2018-01-01 00:50:00 UTC" "2018-01-01 01:00:00 UTC" "2018-01-01 01:10:00 UTC"
[9] "2018-01-01 01:20:00 UTC" "2018-01-01 01:30:00 UTC" "2018-01-01 01:40:00 UTC" "2018-01-01 01:50:00 UTC"
[13] "2018-01-01 02:00:00 UTC" "2018-01-01 02:10:00 UTC" "2018-01-01 02:20:00 UTC" "2018-01-01 02:30:00 UTC"...
而且我还有一个不规则状态的向量,例如:
error = data.frame(
on = parse_date_time(c("2018-01-01 00:13:57","2018-01-01 01:01:44"),"%Y-%m-%d %H:%M:%S"),
off = parse_date_time(c("2018-01-01 00:21:32","2018-01-01 02:33:45"),"%Y-%m-%d %H:%M:%S")
)
> error
on off
1 2018-01-01 00:13:57 2018-01-01 00:21:32
2 2018-01-01 01:01:44 2018-01-01 02:33:45
我怎样才能像下面那样用错误标记我的系列?
> flag
series error
[1] "2018-01-01 00:00:00 UTC" "OK"
[2] "2018-01-01 00:10:00 UTC" "OK"
[3] "2018-01-01 00:20:00 UTC" "ERROR"
[4] "2018-01-01 00:30:00 UTC" "ERROR"
[5] "2018-01-01 00:40:00 UTC" "OK"
[6] "2018-01-01 00:50:00 UTC" "OK"
[7] "2018-01-01 01:00:00 UTC" "OK"
[8] "2018-01-01 01:10:00 UTC" "ERROR"
[9] "2018-01-01 01:20:00 UTC" "ERROR"
[10] "2018-01-01 01:30:00 UTC" "ERROR"
[11] "2018-01-01 01:40:00 UTC" "ERROR"
[12] "2018-01-01 01:50:00 UTC" "ERROR"
[13] "2018-01-01 02:00:00 UTC" "ERROR"
[14] "2018-01-01 02:10:00 UTC" "ERROR"
[15] "2018-01-01 02:20:00 UTC" "ERROR"
[16] "2018-01-01 02:30:00 UTC" "ERROR"
[17] "2018-01-01 02:40:00 UTC" "ERROR"
[18] "2018-01-01 02:50:00 UTC" "OK"
这是一个使用 的解决方案map_lgl
,因为lubridate
间隔dplyr
对我来说很有趣。请注意,我使用ceiling_date
onoff
来重现您想要的输出,尽管我不清楚为什么最后一行算作ERROR
因为,例如,输出"2018-01-01 00:30:00 UTC"
中的第4 行在第一个off
value 之后"2018-01-01 00:21:32"
。关键部分只是使用interval
(或替代地,on %--% off
)创建区间,然后使用any(%within%)
返回逻辑值,以确定系列中的给定值是否在错误区间之一内。ifelse
让我们将值转换为字符标志。
library(tidyverse)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#>
#> date
start = parse_date_time("2018-01-01","%Y-%m-%d")
end = parse_date_time("2018-01-02","%Y-%m-%d")
series = seq(start,end,by=600)
error = data.frame(
on = parse_date_time(c("2018-01-01 00:13:57","2018-01-01 01:01:44"),"%Y-%m-%d %H:%M:%S"),
off = parse_date_time(c("2018-01-01 00:21:32","2018-01-01 02:33:45"),"%Y-%m-%d %H:%M:%S")
) %>%
mutate(
off = ceiling_date(off, unit = "10 minutes"),
intvs = interval(on, off)
)
series %>%
tibble(dttm = .) %>%
bind_cols(status = map_lgl(series, ~ any(. %within% error$intvs))) %>%
mutate(status = ifelse(status == TRUE, "ERROR", "OK")) %>%
print(n = 20)
#> # A tibble: 145 x 2
#> dttm status
#> <dttm> <chr>
#> 1 2018-01-01 00:00:00 OK
#> 2 2018-01-01 00:10:00 OK
#> 3 2018-01-01 00:20:00 ERROR
#> 4 2018-01-01 00:30:00 ERROR
#> 5 2018-01-01 00:40:00 OK
#> 6 2018-01-01 00:50:00 OK
#> 7 2018-01-01 01:00:00 OK
#> 8 2018-01-01 01:10:00 ERROR
#> 9 2018-01-01 01:20:00 ERROR
#> 10 2018-01-01 01:30:00 ERROR
#> 11 2018-01-01 01:40:00 ERROR
#> 12 2018-01-01 01:50:00 ERROR
#> 13 2018-01-01 02:00:00 ERROR
#> 14 2018-01-01 02:10:00 ERROR
#> 15 2018-01-01 02:20:00 ERROR
#> 16 2018-01-01 02:30:00 ERROR
#> 17 2018-01-01 02:40:00 ERROR
#> 18 2018-01-01 02:50:00 OK
#> 19 2018-01-01 03:00:00 OK
#> 20 2018-01-01 03:10:00 OK
#> # ... with 125 more rows
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句