R中的递归滚动平均值

杰森·艾兹卡恩斯

从以下内容开始:

library(tidyverse)
library(lubridate)

df <- tibble(
  date = seq.Date(ymd("2018-01-01"), by = "month", length.out = 6),
  y = c(20, 10, 15, 35, 40, 50)
)

df 
#> # A tibble: 6 x 2
#>   date           y
#>   <date>     <dbl>
#> 1 2018-01-01    20
#> 2 2018-02-01    10
#> 3 2018-03-01    15
#> 4 2018-04-01    35
#> 5 2018-05-01    40
#> 6 2018-06-01    50

我想创建一个新列,z即递归滚动6周期平均值。也就是说,因为2018-07-01这仅仅是最后六个记录的平均值,但是对于2018-08-01向前,我们在新的滚动计算中使用了先前(相关)的滚动平均值。

2018-07-01 =平均值(c(20,10,15,35,40,50))= 28.3333 
2018-08-01 =平均值(c(10,15,35,40,50,28.3333))= 29.7222 
2018-09-01 =平均值(c(15,35,40,50,28.3333,29.7222)= 33.0093 
...等...

我曾尝试使用tibbletime::rollify进行一些操作zoo::rollmeanr,但都不允许我递归地引用上次计算的滚动平均值。

所需输出:

desired_df <- tibble(
  date = seq.Date(ymd("2018-01-01"), by = "month", length.out = 22),
  y = c(20, 10, 15, 35, 40, 50, rep(NA, 16)),
  z = c(
    rep(NA, 6), 
    28.3333, 29.7222, 33.0093, 36.0108, 36.1793, 35.5425, 33.1329, 
    33.9328, 34.6346, 34.9055, 34.7213, 34.4783, 34.3009, 34.4955, 
    34.5893, 34.5818
  )
)
desired_df
#> # A tibble: 22 x 3
#>    date           y     z
#>    <date>     <dbl> <dbl>
#>  1 2018-01-01    20  NA  
#>  2 2018-02-01    10  NA  
#>  3 2018-03-01    15  NA  
#>  4 2018-04-01    35  NA  
#>  5 2018-05-01    40  NA  
#>  6 2018-06-01    50  NA  
#>  7 2018-07-01    NA  28.3
#>  8 2018-08-01    NA  29.7
#>  9 2018-09-01    NA  33.0
#> 10 2018-10-01    NA  36.0
#> # ... with 12 more rows
德迈尔

我们可以创建一个使用简单for-loop作为简单解决方案的函数

recursive_roll <- function(x, fn = mean, window_size = 6, ...) {
    # Use fn (mean by default) on a rolling recursive window
    # ... are arguments passed to fn
    n <- length(x)
    result <- x
    for ( i in (window_size + 1):n ) {
        result[i] <- fn(result[(i - window_size):(i - 1)], ...)
    }
    # I add in this line below to make it in line with your desired output.
    # You may choose to omit this (keep the initial values of your vector),
    # or even make this part optional.
    result[1:window_size] <- NA
    return(result)
}

关于您的算法,需要注意的一点是最终它会收敛到一个将被重复的数字。我使用50个观察值而不是22个观察值来证明这一点:

library(dplyr)
library(lubridate)

N <- 50 # Total number of observations; I use 50 to illustrate convergence
window_size <- 6

df <- tibble(
    date = seq.Date(ymd("2018-01-01"), by = "month", length.out = N),
    y = c(20, 10, 15, 35, 40, 50, rep(NA, N - window_size))
)

desired_df <- df %>% mutate(z = recursive_roll(y))

让我们检查结果:

desired_df
# A tibble: 50 x 3
   date           y     z
   <date>     <dbl> <dbl>
 1 2018-01-01    20  NA  
 2 2018-02-01    10  NA  
 3 2018-03-01    15  NA  
 4 2018-04-01    35  NA  
 5 2018-05-01    40  NA  
 6 2018-06-01    50  NA  
 7 2018-07-01    NA  28.3
 8 2018-08-01    NA  29.7
 9 2018-09-01    NA  33.0
10 2018-10-01    NA  36.0
# … with 40 more rows
tail(desired_df)
# A tibble: 6 x 3
  date           y     z
  <date>     <dbl> <dbl>
1 2021-09-01    NA  34.5
2 2021-10-01    NA  34.5
3 2021-11-01    NA  34.5
4 2021-12-01    NA  34.5
5 2022-01-01    NA  34.5
6 2022-02-01    NA  34.5

plot(desired_df$date, desired_df$z, type = "l")

在此处输入图片说明

更具体地说,您的算法收敛到的数量可以解析为

r <- sum(1:window_size * head(desired_df$y, window_size)) / sum(1:window_size)

使用后N = 500,我们看到

desired_df$z[N] == r
# [1] TRUE
sprintf("%.17f", c(desired_df$z[N], r))
# [1] "34.52380952380952550" "34.52380952380952550"

这是因为您只使用window_size观察值;您可能更喜欢的是指数加权移动平均线:

ewma <- function(x, weight = 1 / (length(x) + 1)) {
    # Gives the exponentially weighted moving average, defined as:
    # EWMA_t = weight * x_t + (1 - weight) * EWMA_{t-1}
    result <- x
    for ( i in 2:length(x) ) {
        result[i] <- weight * result[i] + (1 - weight) * result[i - 1]
    }
    return(result)
}

set.seed(123)
N <- 50
x <- c(20, 10, 15, 35, 40, 50)
df <- tibble(
    date = seq.Date(ymd("2018-01-01"), by = "month", length.out = N),
    y = c(x, sample(30:50, size = N - window_size, replace = TRUE))
)

df2 <- df %>% mutate(z = recursive_roll(y), z2 = ewma(y))
plot(df2$date, df2$y, pch = 20, col = "#80808080")
lines(df2$date, df2$z, col = "blue")
lines(df2$date, df2$z2, col = "red")

在此处输入图片说明

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章