使用 auto_arima(SARIMAX) 和傅里叶项预测具有多个季节性的时间序列

Forecasting time series with multiple seasonaliy by using auto_arima(SARIMAX) and Fourier terms

我正在尝试通过使用 auto_arima 并添加傅立叶项作为外生特征来预测 Python 中的时间序列。数据来自kaggle的Store item demand forecasting challenge。它由 10 家商店和 50 个项目的长格式时间序列组成,从而产生 500 个时间序列相互堆叠。这个时间序列的特殊性在于它有具有每周和每年季节性的每日数据。

为了捕捉这两个级别的季节性,我首先使用了 Rob J Hyndman 在 Forecasting with daily data 中推荐的 TBATS,实际上效果很好。

我还关注了 TBATS python 库的创建者发布的 medium article,他将其与 SARIMAX + 傅立叶项(Hyndman 也推荐)进行了比较。

但是现在,当我尝试使用第二种方法将 pmdarima 的 auto_arima 和傅立叶项作为外生特征时,我得到了意想不到的结果。

在下面的代码中,我只使用了我拆分成训练和测试数据的train.csv文件(去年用于预测)并设置傅里叶项的最大阶数K = 2。

我的问题是我得到了一个平滑的预测(见下图),它似乎没有捕捉到每周的季节性,这与 article 结束时的结果不同。 我的代码有问题吗?

完整代码:

# imports
import pandas as pd
from pmdarima.preprocessing import FourierFeaturizer
from pmdarima import auto_arima
import matplotlib.pyplot as plt

# Upload the data that consist in a long format time series of multiple TS stacked on top of each other
# There are 10 (stores) * 50 (items) = 500 time series
train_data = pd.read_csv('train.csv', index_col='date', parse_dates=True)

# Select only one time series for store 1 and item 1 for the purpose of the example
train_data = train_data.query('store == 1 and item == 1').sales

# Prepare the fourier terms to add as exogenous features to auto_arima
# Annual seasonality covered by fourier terms
four_terms = FourierFeaturizer(365.25, 2)
y_prime, exog = four_terms.fit_transform(train_data)
exog['date'] = y_prime.index # is exactly the same as manual calculation in the above cells
exog = exog.set_index(exog['date'])
exog.index.freq = 'D'
exog = exog.drop(columns=['date'])


# Split the time series as well as exogenous features data into train and test splits 
y_to_train = y_prime.iloc[:(len(y_prime)-365)]
y_to_test =  y_prime.iloc[(len(y_prime)-365):] # last year for testing

exog_to_train = exog.iloc[:(len(exog)-365)]
exog_to_test = exog.iloc[(len(exog)-365):]


# Fit model
# Weekly seasonality covered by SARIMAX
arima_exog_model = auto_arima(y=y_to_train, exogenous=exog_to_train, seasonal=True, m=7)

# Forecast
y_arima_exog_forecast = arima_exog_model.predict(n_periods=365, exogenous=exog_to_test)
y_arima_exog_forecast = pd.DataFrame(y_arima_exog_forecast , index = pd.date_range(start='2017-01-01', end= '2017-12-31'))


# Plots
plt.plot(y_to_test, label='Actual data')
plt.plot(y_arima_exog_forecast, label='Forecast')
plt.legend()

提前感谢您的回答!

这里是答案,以防有人感兴趣。 再次感谢 Flavia Giammarino。

# imports
import pandas as pd
from pmdarima.preprocessing import FourierFeaturizer
from pmdarima import auto_arima
import matplotlib.pyplot as plt

# Upload the data that consists long format time series of multiple TS stacked on top of each other
# There are 10 (stores) * 50 (items) time series
train_data = pd.read_csv('train.csv', index_col='date', parse_dates=True)

# Select only one time series for store 1 and item 1 for the purpose of the example
train_data = train_data.query('store == 1 and item == 1').sales

# Prepare the fourier terms to add as exogenous features to auto_arima
# Annual seasonality covered by fourier terms
four_terms = FourierFeaturizer(365.25, 1)
y_prime, exog = four_terms.fit_transform(train_data)
exog['date'] = y_prime.index # is exactly the same as manual calculation in the above cells
exog = exog.set_index(exog['date'])
exog.index.freq = 'D'
exog = exog.drop(columns=['date'])


# Split the time series as well as exogenous features data into train and test splits 
y_to_train = y_prime.iloc[:(len(y_prime)-365)]
y_to_test =  y_prime.iloc[(len(y_prime)-365):] # last year for testing

exog_to_train = exog.iloc[:(len(exog)-365)]
exog_to_test = exog.iloc[(len(exog)-365):]


# Fit model
# Weekly seasonality covered by SARIMAX
arima_exog_model = auto_arima(y=y_to_train, D=1, exogenous=exog_to_train, seasonal=True, m=7)

# Forecast
y_arima_exog_forecast = arima_exog_model.predict(n_periods=365, exogenous=exog_to_test)
y_arima_exog_forecast = pd.DataFrame(y_arima_exog_forecast , index = pd.date_range(start='2017-01-01', end= '2017-12-31'))


# Plots
plt.plot(y_to_test, label='Actual data')
plt.plot(y_arima_exog_forecast, label='Forecast')
plt.legend()