预测模型预测提前一天 - 滑动 window

Question

我遇到了一个问题。我正在使用 SparkR 进行时间序列预测，但这种情况也可以转移到普通的 R 环境中。我不想使用 ARIMA 模型，而是使用随机森林回归等回归模型来预测未来一天的负载。我还阅读了有关使用滑动 window 方法评估不同回归器相对于不同参数组合的性能的信息。所以为了更好地理解这是我的数据集结构的一个例子：

Timestamp              UsageCPU     UsageMemory   Indicator  Delay
2014-01-03 21:50:00    3123            1231          1        123
2014-01-03 22:00:00    5123            2355          1        322
2014-01-03 22:10:00    3121            1233          2        321
2014-01-03 22:20:00    2111            1234          2        211
2014-01-03 22:30:00    1000            2222          2         0 
2014-01-03 22:40:00    4754            1599          1         0

要使用任何类型的回归器，下一步是提取特征并将它们转换为可读格式，因为这些回归无法读取时间戳：

Year   Month  Day  Hour    Minute    UsageCPU   UsageMemory  Indicator Delay
2014   1      3    21       50        3123        1231          1      123
2014   1      3    22       00        5123        2355          1      322
2014   1      3    22       10        3121        1233          2      321
2114   1      3    22       20        2111        1234          2      211

下一步是为模型创建训练集和测试集。

trainTest <-randomSplit(SparkDF,c(0.7,0.3), seed=42)
train <- trainTest[[1]]
test <- trainTest[[2]]

然后就可以创建模型+预测了（randomForest的设置先不相关）：

model <- spark.randomForest(train, UsageCPU ~ ., type = "regression", maxDepth = 5, maxBins = 16)
predictions <- predict(model, test)

所以我知道所有这些步骤，并且通过将预测数据与实际数据绘制在一起看起来相当不错。但是这个回归模型不是动态的，这意味着我无法提前预测一天。因为UsageCPU、UsageMemory等特性不存在，所以想从历史值预测到第二天。正如开头提到的，滑动 window 方法可以在这里工作，但我不确定如何应用它（在整个数据集上，仅在训练或测试集上）。

此实现来自 shabbychef's and mbq:

 slideMean<-function(x,windowsize=3,slide=2){
 idx1<-seq(1,length(x),by=slide);
 idx1+windowsize->idx2;
 idx2[idx2>(length(x)+1)]<-length(x)+1;
 c(0,cumsum(x))->cx;
 return((cx[idx2]-cx[idx1])/windowsize);
}

最后一个问题涉及 window 尺码。我想以小时为单位预测第二天 (00,01,02,03...)，但是时间戳的间隔是 10 分钟，所以在我的计算中 window 的大小应该是 144 (10 *60*24 / 10).

如果有人能帮助我，那就太好了。谢谢！

Answer 1

我在使用神经网络进行时间序列预测时也遇到了同样的问题。我实现了很多模型，效果最好的模型是滑动 window 与神经网络相结合。我也从该领域的其他研究人员那里得到证实。由此我们得出结论，如果你想在一步中预测提前 1 天（24 个视野），训练将对系统提出要求。我们进行了以下操作：

1. We had a sliding window of 24 hours. e.g lets use [1,2,3] here
2. Then use ML model to predict the [4]. Meaning use value 4 as target. 
# As illustration we had 
x = [1,2,3] 
# then set target as 
y=[4]. 
# We had a function that returns the x=[1,2,3] and y =[4] and
# shift the window in the next training step. 
3.To the:
x =[1,2,3] 
we can add further features that are important to the model. 
x=[1,2,3,feature_x]

4. Then we minimise error and shift the window to have:
 x = [2,3,4,feature_x] and y = [5]. 
5. You could also predict two values ahead. e.g [4,5] .
6. Use a list to collect output and plot
7. Make prediction after the training.

预测模型预测提前一天 - 滑动 window

Forecasting model predict one day ahead - sliding window

statistics

r

machine-learning

prediction

sliding-window