如何确定r中长序列的最长连续序列

Grec001

我有一个序列作为玩具示例。如何确定最长的连续子序列?现在,我可以找到临界点在哪里,如何获得这些值?

DT <- data.table(X = c(3:7, 16:18, 22:29, 31:36))
DT[,Y:=(shift(.SD,type = "lag", fill = -1))][,Y:= Y-X]
with(DT, which(Y !=-1)) 

我希望找到的是子序列的值,在这种情况下,应为 c(22, 23, 24, 25, 26, 27, 28, 29)

chinsoon12

数据变大时应该更快的另一种选择:

DT[DT[, {
    rl <- cumsum(c(1L, diff(X)>1L))
    rw <- rowid(rl)
    .I[rl==rl[which.max(rw)]]}]]

计时码:

set.seed(0L)
nr <- 1e7
ngap <- nr/2
DT <- data.table(X=sample(nr, ngap))
setorder(DT, X)

mtd0 <- function() {
    DT[, length := .N, by = cumsum(c(1, diff(X) != 1))][length == max(length), X]
}

mtd1 <- function() {
    ls <- split(DT$X, cumsum(c(TRUE, diff(DT$X) != 1)))
    DT[X %in% ls[[which.max(lengths(ls))]], X]
}

mtd2 <- function() {
    DT[DT[, {
        rl <- cumsum(c(1L, diff(X)>1L))
        rw <- rowid(rl)
        .I[rl==rl[which.max(rw)]]}], X]
}

bench::mark(mtd0(), mtd1(), mtd2(), check=FALSE)

输出:

> mtd0()
[1] 4622514 4622515 4622516 4622517 4622518 4622519 4622520 4622521 4622522 4622523 4622524 4622525 4622526 4622527 4622528 4622529 4622530 4622531 4622532
[20] 4622533 4622534 4622535 8390357 8390358 8390359 8390360 8390361 8390362 8390363 8390364 8390365 8390366 8390367 8390368 8390369 8390370 8390371 8390372
[39] 8390373 8390374 8390375 8390376 8390377 8390378
> mtd1()
[1] 4622514 4622515 4622516 4622517 4622518 4622519 4622520 4622521 4622522 4622523 4622524 4622525 4622526 4622527 4622528 4622529 4622530 4622531 4622532
[20] 4622533 4622534 4622535
> mtd2()
[1] 4622514 4622515 4622516 4622517 4622518 4622519 4622520 4622521 4622522 4622523 4622524 4622525 4622526 4622527 4622528 4622529 4622530 4622531 4622532
[20] 4622533 4622534 4622535

时间:

# A tibble: 3 x 13
  expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result     memory                time     gc              
  <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list>     <list>                <list>   <list>          
1 mtd0()        1.34s    1.34s     0.747     363MB     1.49     1     2      1.34s <int [44]> <df[,3] [42 x 3]>     <bch:tm> <tibble [1 x 3]>
2 mtd1()        2.13s    2.13s     0.470     548MB     1.88     1     4      2.13s <int [22]> <df[,3] [34,671 x 3]> <bch:tm> <tibble [1 x 3]>
3 mtd2()     642.91ms 642.91ms     1.56      343MB     4.67     1     3   642.91ms <int [22]> <df[,3] [29 x 3]>     <bch:tm> <tibble [1 x 3]>

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章