使用 ETS 和循环函数评估预测准确性
Evaluating forecast accuracy with ETS and loop function
我正在结合来自 fpp2 包的数据集和来自预测的 ets 函数进行造林 package.Because 我预测几个时间序列 我使用自己的函数同时进行多个预测。
# CODE
library(fpp2) # required for the data
library(dplyr)
library(forecast)
MY_DATA<-uschange[,1:4]
head(MY_DATA)
tail(MY_DATA)
#1. Own forecasting function
FORECASTING_FUNCTION_ETS <- function(Z, hrz = 16) {
timeseries <- msts(Z, start = 1970, seasonal.periods = 4)
forecast <- ets(timeseries)
}
为了获得更准确的投影,我想使用 partitioning.Partitioning 是通过将系列修剪成两个时期来完成的。前期是训练集,后期是测试集。
#2.Partitioning (training and test set)
for (i in 1:20)
{ nTest <- 16*i
nTrain <- length(MY_DATA[,2:2])- nTest
train <- window(MY_DATA[,2:2],start=1970, end=c(2015,3),nTrain)
test <- window(MY_DATA[,2:2], start=1970, end=c(2016,3),nTrain+16)
s <- FORECASTING_FUNCTION_ETS(train)
sp<- predict(s,h=16)
cat("----------------------------------
Data Partition",i,"
Training Set includes",nTrain," time periods. Observations 1 to", nTrain, "
Test Set includes 16 time periods. Observations", nTrain+1, "to", nTrain+16,"
")
print(accuracy(sp,test))
cat("
")
print(sp$model)
}
到目前为止一切顺利 :) 此代码适用于一个系列(消费),我可以看到训练集和测试集的所有结果。
但我的目的是使用上面的代码进行分区,不仅对一个而且对所有四个系列(消费、收入、生产和储蓄)同时进行。
出于这个原因,我尝试使用下面的代码,其中我使用“[,i]”,以便使用下面的代码从所有四个系列中获得结果:
#3.Trying to upgrade code above
for (i in 1:20)
{ nTest[,i] <- 16*i
nTrain[,i] <- length(MY_DATA[,i])- nTest
train[,i] <- window(MY_DATA[,i],start=1970, end=c(2015,3),nTrain)
test[,i] <- window(MY_DATA[,i], start=1970, end=c(2016,3),nTrain+16)
s <- FORECASTING_FUNCTION_ETS(train[,i])
sp<- predict(s[,i],h=16)
cat("----------------------------------
Data Partition",i,"
Training Set includes",nTrain," time periods. Observations 1 to", nTrain, "
Test Set includes 16 time periods. Observations", nTrain+1, "to", nTrain+16,"
")
print(accuracy(sp,test))
cat("
")
print(sp$model)
}
但是有一些错误,这段代码不能正常工作。那么有人可以帮我解决这个问题并修复上面的代码吗?
这不完全是您的要求,所以我不希望您接受这个答案,但这对我来说是一个有趣的问题,所以我想我还是会提供一种方法。
我首先假设您的主要目标是弄清楚如何迭代一个过程来评估跨多个时间序列的预测方法的准确性。您想通过扩展 window 来做到这一点,在这种情况下,您逐渐增加训练集中包含的数据的比例,同时反复尝试预测未来的一些固定步数,这是一个模仿此任务经常进行的过程现实生活。
为简单起见,我还假设您真的不需要将所有输出打印到控制台,并且更感兴趣的是查看与这些迭代相关的准确度指标的分布和汇总统计信息(就像您要遵循的示例末尾的 table)。
从这些假设出发,这是一种方法。
# Split your data frame into a list of one-column data frames (here, time series) using as.list,
# then use lapply to iterate your validation process over those series.
Y <- lapply(as.list(MY_DATA), function(x) {
# Instead of a for loop, let's use sapply to iterate over a vector of integers
# representing the width of the training set in our expanding window, starting at
# 70 percent of the full series and running to the series' end. Let's assume that,
# in each iteration, we're going to forecast the following four quarters.
sapply(ceiling(length(x) * 0.7):(length(x) - 4), function(i) {
# Because we're using indices instead of dates, we need to partition the
# series with subset instead of window. The training set runs from the start
# of the series to our integer, and the test set grabs the next 4 quarters.
train <- subset(x, end = i)
test <- subset(x, start = i + 1, end = i + 4)
# Now we fit an ETS model to that training set and use it to generate
# forecasts for the following 4 quarters.
mod <- ets(train)
preds <- predict(mod, h = 4)
# Finally, we check the accuracy of those forecasts against the test set...
check <- accuracy(preds, test)
# ...and return the accuracy metric of our choice (I've picked MAPE because
# that's the one used in the example you're trying to follow, but that's easy
# to change, or you could just return the accuracy object if you want options).
return(check["Test set", "MAPE"])
})
})
在这种情况下,该过程 returns 一个包含四个向量的列表,每个向量的长度为 53。因为这些向量在一个列表中,所以您可以轻松地总结它们以了解每个系列的总体准确度。我喜欢查看精度度量的分布,您可以在此处使用密度图轻松完成。当然,最简单的就是看集中趋势:
> sapply(Y, mean)
Consumption Income Production Savings
131.4818 172.7535 138.3171 106.9114
如果您想将 ETS 的结果与其他预测过程的结果进行比较,您只需换掉模型拟合的位,重新运行并比较摘要。或者您可以将该比较折叠到过程中,使用 lapply
而不是 sapply
并返回一个矩阵或数据框,并排显示两个过程的结果。
正如我所说,我知道这与您在该博客 post 中直接实施该方法的尝试有些偏离,但我认为这与您的努力精神是一致的,这对我来说很有趣去锻炼。
我正在结合来自 fpp2 包的数据集和来自预测的 ets 函数进行造林 package.Because 我预测几个时间序列 我使用自己的函数同时进行多个预测。
# CODE
library(fpp2) # required for the data
library(dplyr)
library(forecast)
MY_DATA<-uschange[,1:4]
head(MY_DATA)
tail(MY_DATA)
#1. Own forecasting function
FORECASTING_FUNCTION_ETS <- function(Z, hrz = 16) {
timeseries <- msts(Z, start = 1970, seasonal.periods = 4)
forecast <- ets(timeseries)
}
为了获得更准确的投影,我想使用 partitioning.Partitioning 是通过将系列修剪成两个时期来完成的。前期是训练集,后期是测试集。
#2.Partitioning (training and test set)
for (i in 1:20)
{ nTest <- 16*i
nTrain <- length(MY_DATA[,2:2])- nTest
train <- window(MY_DATA[,2:2],start=1970, end=c(2015,3),nTrain)
test <- window(MY_DATA[,2:2], start=1970, end=c(2016,3),nTrain+16)
s <- FORECASTING_FUNCTION_ETS(train)
sp<- predict(s,h=16)
cat("----------------------------------
Data Partition",i,"
Training Set includes",nTrain," time periods. Observations 1 to", nTrain, "
Test Set includes 16 time periods. Observations", nTrain+1, "to", nTrain+16,"
")
print(accuracy(sp,test))
cat("
")
print(sp$model)
}
到目前为止一切顺利 :) 此代码适用于一个系列(消费),我可以看到训练集和测试集的所有结果。
但我的目的是使用上面的代码进行分区,不仅对一个而且对所有四个系列(消费、收入、生产和储蓄)同时进行。 出于这个原因,我尝试使用下面的代码,其中我使用“[,i]”,以便使用下面的代码从所有四个系列中获得结果:
#3.Trying to upgrade code above
for (i in 1:20)
{ nTest[,i] <- 16*i
nTrain[,i] <- length(MY_DATA[,i])- nTest
train[,i] <- window(MY_DATA[,i],start=1970, end=c(2015,3),nTrain)
test[,i] <- window(MY_DATA[,i], start=1970, end=c(2016,3),nTrain+16)
s <- FORECASTING_FUNCTION_ETS(train[,i])
sp<- predict(s[,i],h=16)
cat("----------------------------------
Data Partition",i,"
Training Set includes",nTrain," time periods. Observations 1 to", nTrain, "
Test Set includes 16 time periods. Observations", nTrain+1, "to", nTrain+16,"
")
print(accuracy(sp,test))
cat("
")
print(sp$model)
}
但是有一些错误,这段代码不能正常工作。那么有人可以帮我解决这个问题并修复上面的代码吗?
这不完全是您的要求,所以我不希望您接受这个答案,但这对我来说是一个有趣的问题,所以我想我还是会提供一种方法。
我首先假设您的主要目标是弄清楚如何迭代一个过程来评估跨多个时间序列的预测方法的准确性。您想通过扩展 window 来做到这一点,在这种情况下,您逐渐增加训练集中包含的数据的比例,同时反复尝试预测未来的一些固定步数,这是一个模仿此任务经常进行的过程现实生活。
为简单起见,我还假设您真的不需要将所有输出打印到控制台,并且更感兴趣的是查看与这些迭代相关的准确度指标的分布和汇总统计信息(就像您要遵循的示例末尾的 table)。
从这些假设出发,这是一种方法。
# Split your data frame into a list of one-column data frames (here, time series) using as.list,
# then use lapply to iterate your validation process over those series.
Y <- lapply(as.list(MY_DATA), function(x) {
# Instead of a for loop, let's use sapply to iterate over a vector of integers
# representing the width of the training set in our expanding window, starting at
# 70 percent of the full series and running to the series' end. Let's assume that,
# in each iteration, we're going to forecast the following four quarters.
sapply(ceiling(length(x) * 0.7):(length(x) - 4), function(i) {
# Because we're using indices instead of dates, we need to partition the
# series with subset instead of window. The training set runs from the start
# of the series to our integer, and the test set grabs the next 4 quarters.
train <- subset(x, end = i)
test <- subset(x, start = i + 1, end = i + 4)
# Now we fit an ETS model to that training set and use it to generate
# forecasts for the following 4 quarters.
mod <- ets(train)
preds <- predict(mod, h = 4)
# Finally, we check the accuracy of those forecasts against the test set...
check <- accuracy(preds, test)
# ...and return the accuracy metric of our choice (I've picked MAPE because
# that's the one used in the example you're trying to follow, but that's easy
# to change, or you could just return the accuracy object if you want options).
return(check["Test set", "MAPE"])
})
})
在这种情况下,该过程 returns 一个包含四个向量的列表,每个向量的长度为 53。因为这些向量在一个列表中,所以您可以轻松地总结它们以了解每个系列的总体准确度。我喜欢查看精度度量的分布,您可以在此处使用密度图轻松完成。当然,最简单的就是看集中趋势:
> sapply(Y, mean)
Consumption Income Production Savings
131.4818 172.7535 138.3171 106.9114
如果您想将 ETS 的结果与其他预测过程的结果进行比较,您只需换掉模型拟合的位,重新运行并比较摘要。或者您可以将该比较折叠到过程中,使用 lapply
而不是 sapply
并返回一个矩阵或数据框,并排显示两个过程的结果。
正如我所说,我知道这与您在该博客 post 中直接实施该方法的尝试有些偏离,但我认为这与您的努力精神是一致的,这对我来说很有趣去锻炼。