在 R 中进行长期预测时,TBATS 和 NNETAR 函数的性能不佳

Bad performance of TBATS and NNETAR functions when forecasting on a long-term basis in R

我一直在使用 forecast 包中的 tbatsnnetar 函数来生成每小时电力负荷预测,预测范围为一周和一个月,并且两者模型表现令人满意。我的数据集包含从 2017 年 1 月到 2022 年 5 月初的每小时值(46848 个值)。但是,当我尝试进行到年底的每小时负载预测(07/05/2022-31/12/2022,每小时值 5736)时,结果要么持平要么失去季节性。有谁知道为什么长期预测会给出如此糟糕的结果?对任何一种模型的任何想法都将受到高度赞赏。对于非常大的数据集,我深表歉意。

我已将数据集上传到 git hub:

df <- read.csv(file = "https://raw.githubusercontent.com/Argiro1983/Load/LOAD/LOAD_2017_2022.csv", sep=";")
#fix datetime
df$TIME<- with(df, sprintf("%02d:00", TIME-1))
df$DATE<-as.Date(df$DATE, "%d/%m/%Y")
df$TIME <- paste(df$TIME, ':00', sep = '')
View(df)

library(ggpubr)
library(chron)
df$TIME <- chron(times=df$TIME)

DATETIME<-as.POSIXct(paste(df$DATE, df$TIME), origin = "1970-01-01 00:00:00", tz="UTC", usetz=TRUE)
my_df <- data.frame(timestamp = as.POSIXct(DATETIME, format = "%d.%m.%Y %H:%M", origin = "1970-01-01 00:00:00", tz = "UTC"), input = df[,3])
my_df <- setNames(my_df, c("DATETIME","LOAD"))

特别是 TBATS 模型结果失去了季节性并且看起来很奇怪。我使用的代码如下:

library(ggplot2)
library(forecast)
library(tseries)
library(dplyr)

Load = ts(my_df[, c('LOAD')])
my_df$Clean_Load = tsclean(Load)
Clean_Load = ts(my_df[, c('Clean_Load')])
load_ts = ts(Clean_Load)

msts <- msts(load_ts, seasonal.periods=c(24,168,8760), start=c(2017,01))
plot(msts, main="Load", xlab="Year", ylab="MWh")

s <- tbats(msts)
sp<- predict(s,h=5736) 

当我 运行 nnetar 函数时,无论是否使用温度作为外部回归量,结果也很平缓。我尝试了不同的 lambda,但 none 似乎有效:

#create dataframe for temperature historical values
Temperature_history <- read.csv(file = "https://raw.githubusercontent.com/Argiro1983/Load/LOAD/Temperature_history.csv", sep=";")

DATETIME<-as.POSIXct(Temperature_history$Datetime, format = "%d/%m/%Y %H:%M", tz="UCT", usetz=TRUE)
Temperature_df <- data.frame(timestamp = as.POSIXct(DATETIME, format = "%d/%m/%Y %H:%M", tz = "UCT"), input = Temperature_history$Temperature)
Temperature_df<- setNames(Temperature_df, c("DATETIME","TEMPERATURE"))


#create dataframe for temperature forecasted values
Temperature_forecast <- read.csv(file = "https://raw.githubusercontent.com/Argiro1983/Load/LOAD/Temperature_forecast.csv", sep=";")

DATETIME2<-as.POSIXct(Temperature_forecast$datehour, format = "%d/%m/%Y %H:%M", tz="UCT", usetz=TRUE)
Temp_forecast <- data.frame(timestamp = as.POSIXct(DATETIME2, format = "%d/%m/%Y %H:%M", tz = "UCT"), input = Temperature_forecast$TEMP_FORECAST)
View(Temp_forecast)
Temp_forecast <- setNames(Temp_forecast, c("DATETIME","TEMPERATURE"))
View(Temp_forecast)

#define and run NN model
library(forecast)
myts = ts(my_df$LOAD, frequency = 24)

fit2 = nnetar(myts,xreg = Temperature_df$TEMPERATURE, lambda = 0.5, P=1, MaxNWts=1177)
nnetforecast <- forecast(fit2, xreg = Temp_forecast$TEMPERATURE, h = 5736, PI = F, npaths=100, bootstrap = TRUE)  
autoplot(nnetforecast, h = 5736)

首先,您的代码将无法运行,因为 github link 未指向 csv 文件。将第一行替换如下

df <- read.csv(file = "https://raw.githubusercontent.com/Argiro1983/Load/LOAD/LOAD_2017_2022.csv", sep=";")

然后 运行 你的代码,我在前几周得到了 tbats 模型的合理结果:

sp <- forecast(s,h=14*24) 
autoplot(sp, include=14*24)

使用时间序列模型来预测更远的未来在这里毫无意义。

无论如何,有 well-developed 个电力需求模型比 TBATS 或 NNETAR 做得更好。对于一个简单的起点,请尝试 https://doi.org/10.1016/j.ijforecast.2015.09.006 的第 2.2 节中描述的 Tao Hong 的 vanilla 模型。这只是一个线性回归,但它会比您正在尝试的任何这些模型做得更好。