滚动平方根等功能

用户名

我想创建一系列滚动误差函数。我有以下数据:

dat <- data.frame(
  date <- seq.Date(from = as.Date("2010-01-01"), by = 1, length.out = 100),
  pred <- sample(1000, 100, replace = FALSE),
  actual <- sample(1000, 100, replace = FALSE)
) %>% 
  setNames(c("date", "pred", "actual"))

看起来像:

          date pred actual
1   2010-01-01   99    835
2   2010-01-02  429    779
3   2010-01-03  726    581

我想使用该rollapply函数创建滚动平方误差。我可以使用以下内容来创建滚动方式

window_size = 30 + 1 -1
dat %>%
  arrange(desc(date)) %>%
  mutate(
    error = (pred - actual),
    squared_error = error**2,

    #rolling calcs
    rolling_mean_error = c(rollapply(error, width = window_size, by = 1, FUN = mean), rep(NA, window_size - 1))
  )

但是,我想使用一个squared_error功能。

squared_function <- function(err){
  err**2
}


dat %>%
  arrange(desc(date)) %>%
  mutate(
    error = (pred - actual),
    squared_error = error**2,

    #rolling calcs
    rolling_mean_error = c(rollapply(error, width = window_size, by = 1, FUN = mean), rep(NA, window_size - 1)),
    rolling_squared_error = c(rollapply(error, width = window_size, by = 1, FUN = squared_function), rep(NA, window_size - 1))
  )

但是,它失败并显示以下错误:

错误:mutate()参数rolling_squared_error必须可回收。rolling_squared_errorc(...)xrolling_squared_error无法回收为100号。.rolling_squared_error必须为100号或1号,而不是2159号

编辑:

库:

library(dplyr)
library(zoo)
r2evans

squared_function应该返回一个数字,而不是与输入长度相同的向量。我怀疑您需要sum(err**2)(平方和)或sqrt(sum(err**2))

尝试这个:


set.seed(42)
dat <- tibble(
  date   = seq.Date(from = as.Date("2010-01-01"), by = 1, length.out = 100),
  pred   = sample(1000, 100, replace = FALSE),
  actual = sample(1000, 100, replace = FALSE)
) %>% 
  setNames(c("date", "pred", "actual"))

window_size <- 30 + 1 -1

squared_function <- function(err) sum(err**2)

dat2 <- dat %>%
  arrange(desc(date)) %>%
  mutate(
    error = (pred - actual),
    squared_error = error**2,
    rolling_mean_error = zoo::rollapply(
      error, width = window_size, by = 1, FUN = mean,
      align = "left", fill = NA),
    rolling_squared_error = zoo::rollapply(
      error, width = window_size, by = 1, FUN = squared_function,
      align = "left", fill = NA)
  )
dat2
# # A tibble: 100 x 7
#    date        pred actual error squared_error rolling_mean_error rolling_squared_error
#    <date>     <int>  <int> <int>         <dbl>              <dbl>                 <dbl>
#  1 2010-04-10   558    659  -101         10201              -93.7               5334540
#  2 2010-04-09   672    671     1             1              -88.6               5326839
#  3 2010-04-08   466    102   364        132496              -70.7               5616282
#  4 2010-04-07   302    481  -179         32041              -67.0               5711315
#  5 2010-04-06   665     49   616        379456              -66.6               5707835
#  6 2010-04-05   839     66   773        597529              -86.8               5328479
#  7 2010-04-04   954    908    46          2116             -103.                4817975
#  8 2010-04-03   190    118    72          5184              -92.8               4935575
#  9 2010-04-02     1    713  -712        506944              -90.2               4952295
# 10 2010-04-01   608    944  -336        112896              -53.0               4610187
# # ... with 90 more rows

解释align="left"

每次squared_function调用时,都会给它分配30个数字,并且需要返回1。一个问题:那个数字放在哪里?

让我们看一个人为的例子:

vec <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0)
zoo::rollapply(vec, 5, FUN = mean)
#  [1] 3 4 5 6 7 6 5 4 3 2 3 4 5 6 7 6

结果是16长。当我们需要它与原始向量的长度相同时,我们可以使用fill=NA它来填充相同的长度。但这引出了一个问题:每个数字都去了哪里?

zoo::rollapply(vec, 5, FUN = mean, fill = NA)
#  1,  2,  3,  4,  5,  6,  7,  8,  9,  0,  1,  2,  3,  4,  5,  6,  7,  8,  9,  0
# `------. ,--------'
#         v
# __, __, __, __, __
#  3,                 align="left"
# NA, NA,  3          align="right"
# NA, NA, NA, NA,  3  align="right"

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章