构建 ARIMA 模型

Question

我的数据是时间序列：

library(forecast)    
library(Mcomp)
ts.data<- subset(M3, 12)[[551]]
print(ts.data)

输出：

Series: M551
Type of series: INDUSTRY
Period of series: MONTHLY
Series description: Lumber, softwoods, western pine, production

HISTORICAL data
       Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep   Oct   Nov   Dec
1983  5960  6320  7050  7080  7170  7620  7000  7790  7830  7580  6620  6860
1984  7240  6550  8730  8690  7650  7950  7210  7880  7040  7940  7290  5750
1985  6410  6230  7260  8220  7770  7830  7590  9210  8340  8660  7330  7000
1986  7500  7430  8350  9010  8470  8900  8720  9710 10070  9610  8410  8640
1987  8500  8920 10470  8960  9390 10630  9380 10050  9030 10580  9350  8810
1988  9300 10430 10570 10440 10120  8780  7460  8230  9830 10260  9270  9260
1989  9260  8150  9930  8840  9150 10230  9340 10170  9150 10420  8690  8960
1990  9960  9050 10430  9650  9350  8790  8550  8790  7590  8730  7520  6110
1991  7450  7230  7680  8270  8060  8210  8260  8910  8540  8180  7430  6880
1992  7360  7560  8800  7550  7900  8320  7430                              

FUTURE data
      Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
1992                                    7650 7460 8580 7300 6530
1993 7070 6940 7060 7470 6190 6310 7090 7310 6310 7270 7240 6410
1994 7470

如何构建 ARIMA 模型，然后找到模型的平均绝对误差 (MAE)？

任何想法都会有所帮助。

Answer 1

这是一种方法（注意：要正确执行此操作，您还应该在建模之前调查 ACF/PACF 并在生成初始模型后执行交叉验证）：

# Install pacakges if they are not already installed: necessary_packages => vector
necessary_packages <- c("forecast", "Mcomp")

# Create a vector containing the names of any packages needing installation:
# new_pacakges => vector
new_packages <- necessary_packages[!(necessary_packages %in%
                                       installed.packages()[, "Package"])]

# If the vector has more than 0 values, install the new pacakges
# (and their) associated dependencies:
if(length(new_packages) > 0){install.packages(new_packages, dependencies = TRUE)}

# Initialise the packages in the session: list of boolean => stdout (console)
lapply(necessary_packages, require, character.only = TRUE)

# Subset the M3 data to contain the relevant series: 
ts.data <- subset(M3, 12)[[551]]

# Extract the historical data: 
historical_production_ts <- ts.data[["x"]]

# Set the seed for reproducibility: 
set.seed(2020)

# It is clear from this series we have trend and seasonality:
decomposed_ts <- decompose(historical_production_ts)

# Chart the decomposed series: 
plot(decomposed_ts)

# Note: we have to make series stationary in order to compute 
# an ARIMA model, to do this we need to account for trend and
# seasonality: 

# Use a unit root test to determine the number of differences required 
# to make the trend of the series stationary: 
trend_required_diffs <- ndiffs(decomposed_ts$trend)

# Use a unit root test to determin  the number of differences required 
# to make the seasonality of the series stationary: 
seasonality_required_diffs <- ndiffs(decomposed_ts$seasonal)

# Create an auto-arima model and store the result in the fit variable:
fit <- auto.arima(historical_production_ts,
           # Account for trend:
           d = trend_required_diffs, 
           # Account for seasonality: 
           D = seasonality_required_diffs, 
           # Try out alot of different models: 
           stepwise = FALSE,
           # Don't approximate the AIC: 
           approximation = FALSE)

# Check the Mean Absolute Error (MAE) of the model: 
data.frame(accuracy(fit))[,"MAE"]

# Forecast out the number of future observations required:  
aa_fcast <- forecast(fit, length(ts.data$xx))

# Chart the forecast:
autoplot(aa_fcast) + 
  autolayer(ts.data$xx, series = "Actual Production (Future)") +
  autolayer(aa_fcast$mean, series = "Forecasts")

# A function to calculate the MAE: 
mae <- function(actual, pred){
  mean(abs(pred-actual))
}

# Calculate the accuracy of the forecast against the future data: 
mean_abs_error <- mae(ts.data$xx, data.frame(aa_fcast)[,"Point.Forecast", drop = TRUE])

构建 ARIMA 模型

Build ARIMA model

r

time-series

arima