auto_arima 系列中的单个值更改导致崩溃

auto_arima crashing with single value change on series

我正在使用 pmdarima 开发时间序列预测模型。

我的时间序列很短,但表现还不错。以下代码在 sklearn\utils\validation.py

上给出错误
from pmdarima import auto_arima
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.stattools import adfuller
import datetime
import pandas as pd

datelist = pd.date_range('2018-01-01', periods=24, freq='MS')

sales = [26.000000,27.100000,26.000000,28.014286,28.057143,
         30.128571,39.800000,33.000000,37.971429,45.914286,
         37.942857,33.885714,36.285714,34.971429,40.042857,
         27.157143,30.685714,35.585714,43.400000,51.357143,
         45.628571,49.942857,42.028571,52.714286]


df = pd.DataFrame(data=sales,index=datelist,columns=['sales'])

observations = df['sales']
size = df['sales'].size
shape = df['sales'].shape
maxdate = max(df.index).strftime("%Y-%m-%d")
mindate = min(df.index).strftime("%Y-%m-%d")


asc = seasonal_decompose(df, model='add')

if asc.seasonal[asc.seasonal.notnull()].size == df['sales'].size:
    seasonality = True
else:
    seasonality = False

# Check Stationarity
aftest = adfuller(df['sales'])

if aftest[1] <= 0.05:
    stationarity = True
else:
    stationarity = False

results = auto_arima(observations,
                     seasonal=seasonality,
                     stationary=stationarity,
                     m=12,
                     error_action="ignore")
~\AppData\Roaming\Python\Python37\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    584                              " minimum of %d is required%s."
    585                              % (n_samples, array.shape, ensure_min_samples,
--> 586                                 context))
    587 
    588     if ensure_min_features > 0 and array.ndim == 2:

ValueError: Found array with 0 sample(s) (shape=(0,)) while a minimum of 1 is required.

但是,如果我将销售系列的第一个值从 26 更改为 30,它就会起作用。

这里可能有什么问题?

  1. 您的示例不可重现,因为当前 seasonalitystationarity 未在全局范围内定义。这导致 auto_arima 抛出形式为

    的错误

    NameError: name 'seasonality' is not defined

  2. 您只有很少的观测值,因此请尝试为不同的 ARIMA 过程显式设置 min/max 阶数值。 IMO,这通常是一种很好的做法。对于您的情况,我们可以做到

    fit = auto_arima(
        observations,
        start_p = 0, start_q = 0, start_P = 0, start_Q = 0,
        max_p = 3, max_q = 3, max_P = 3, max_Q = 3,
        D = 1, max_D = 2, m = 12,
        seasonal = True,
        error_action = 'ignore')
    

    这里我们考虑到 MA(3) 和 AR(3) 以及 SMA(3) 和 SAR(3) 的过程。

  3. 让我们可视化包括预测在内的原始时间序列数据

    n_ahead = 10
    preds, conf_int = fit.predict(n_periods = n_ahead, return_conf_int = True)
    xrange = pd.date_range(min(datelist), periods = 24 + n_ahead, freq = 'MS')
    
    import matplotlib.pyplot as plt
    import matplotlib.dates as dates
    
    fig = plt.figure()
    plt.plot(xrange[:df.shape[0]], df["sales"])
    plt.plot(xrange[df.shape[0]:], preds)
    plt.fill_between(
        xrange[df.shape[0]:],
        conf_int[:, 0], conf_int[:, 1],
        alpha = 0.1, color = 'b')
    plt.show()