如何在 R 中创建滑动 window 以将数据划分为测试和训练样本以测试预测的准确性？

Question

我们使用 R 中的 forecast 包来读取 3 周的每小时数据（3*7*24 个数据点）并预测接下来的 24 小时。这是一个具有多重季节性的时间序列。

我们的预测模型运行很好，而且似乎运行良好。现在，我们希望量化我们的数据方法/预测算法的准确性。为此，我们希望使用 forecast 包中的 accuracy 函数。我们知道 accuracy 函数起作用，因此它 f 是预测，x 是实际观察向量，然后 accuracy(f,x) 会给我们这个预测的几个准确度测量值。

我们有过去几个月的数据，我们希望编写一个滑动 window 算法来选择 (3*7*24) 小时值，然后预测接下来的 24 小时。然后，将这些值与第二天/24 小时的实际数据进行比较，显示准确度，然后将 window 滑动（24 点/小时）/第二天并重复。

示例数据生成如下：

library("forecast")

time <- 1:(12*168)
set.seed(1)
ds <- msts(sin(2*pi*time/24)+c(1,1,1.2,0.8,1,0,0)[((time-1)%/%24)%%7+1]+ time/400+rnorm(length(time),0,0.2),seasonal.periods=c(24,168))
plot(ds)
head(ds)
tail(ds)
length(ds)
length(time)

预测程序如下：

model <- tbats(ds[1:504])
fcst <- forecast(model,h=24,level=90)
accuracy(fcst,ds[505:528])     ##Test accuracy of forecast against next/actual 24 values

现在，我们希望将 "window" 滑动 24 并重复相同的过程，即用于构建模型的下一组值将是 ds[25:528]，它们的准确度将是针对 ds[529:552] 进行测试......等等。我们如何实施？

此外，是否有更好的方法来测试此预测算法对我们场景的整体准确性？

Answer 1

我会通过创建一个时间向量来表示滑动的前沿 windows，然后使用 lapply 在 windows 上迭代预测和评分过程边缘意味着。喜欢...

# set a couple of parameters we'll use to slice the series into chunks:
# window width (w) and the time step at which you want to end the first
# training set
w = 24 ; start = 504

# now use those parameters to make a vector of the time steps at which each
# window will end
steps <- seq(start + w, length(ds), by = w)

# using lapply, iterate the forecasting-and-scoring process over the
# windows that created
cv_list <- lapply(steps, function(x) {

    train <- ds[1:(x - w)] 
    test <- ds[(x - w + 1):x]

    model <- tbats(train)
    fcst <- forecast(model, h = w, level = 90)
    accuracy(fcst, test)

})

第一个 window 的示例输出：

> cv_list[[1]]
                       ME      RMSE       MAE        MPE     MAPE      MASE
Training set 0.0001587681 0.3442898 0.2689754 34.3957362 84.30841 0.9560206
Test set     0.2619029897 0.8961109 0.7868256 -0.6832273 36.64301 2.7966186
                   ACF1
Training set 0.02588145
Test set             NA

如果您想要整个列表的分数摘要，您可以执行类似...

rmse <- mean(unlist(lapply(cv_list, '[[', "Test set","RMSE")))

...产生这个：

[1] 1.011177

如何在 R 中创建滑动 window 以将数据划分为测试和训练样本以测试预测的准确性？

How to create a sliding window in R to divide data into test and train samples to test accuracy of forecasts?

r

forecasting