auto_arima 系列中的单个值更改导致崩溃
auto_arima crashing with single value change on series
我正在使用 pmdarima
开发时间序列预测模型。
我的时间序列很短,但表现还不错。以下代码在 sklearn\utils\validation.py
上给出错误
from pmdarima import auto_arima
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.stattools import adfuller
import datetime
import pandas as pd
datelist = pd.date_range('2018-01-01', periods=24, freq='MS')
sales = [26.000000,27.100000,26.000000,28.014286,28.057143,
30.128571,39.800000,33.000000,37.971429,45.914286,
37.942857,33.885714,36.285714,34.971429,40.042857,
27.157143,30.685714,35.585714,43.400000,51.357143,
45.628571,49.942857,42.028571,52.714286]
df = pd.DataFrame(data=sales,index=datelist,columns=['sales'])
observations = df['sales']
size = df['sales'].size
shape = df['sales'].shape
maxdate = max(df.index).strftime("%Y-%m-%d")
mindate = min(df.index).strftime("%Y-%m-%d")
asc = seasonal_decompose(df, model='add')
if asc.seasonal[asc.seasonal.notnull()].size == df['sales'].size:
seasonality = True
else:
seasonality = False
# Check Stationarity
aftest = adfuller(df['sales'])
if aftest[1] <= 0.05:
stationarity = True
else:
stationarity = False
results = auto_arima(observations,
seasonal=seasonality,
stationary=stationarity,
m=12,
error_action="ignore")
~\AppData\Roaming\Python\Python37\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
584 " minimum of %d is required%s."
585 % (n_samples, array.shape, ensure_min_samples,
--> 586 context))
587
588 if ensure_min_features > 0 and array.ndim == 2:
ValueError: Found array with 0 sample(s) (shape=(0,)) while a minimum of 1 is required.
但是,如果我将销售系列的第一个值从 26 更改为 30,它就会起作用。
这里可能有什么问题?
您的示例不可重现,因为当前 seasonality
和 stationarity
未在全局范围内定义。这导致 auto_arima
抛出形式为
的错误
NameError: name 'seasonality' is not defined
您只有很少的观测值,因此请尝试为不同的 ARIMA 过程显式设置 min/max 阶数值。 IMO,这通常是一种很好的做法。对于您的情况,我们可以做到
fit = auto_arima(
observations,
start_p = 0, start_q = 0, start_P = 0, start_Q = 0,
max_p = 3, max_q = 3, max_P = 3, max_Q = 3,
D = 1, max_D = 2, m = 12,
seasonal = True,
error_action = 'ignore')
这里我们考虑到 MA(3) 和 AR(3) 以及 SMA(3) 和 SAR(3) 的过程。
让我们可视化包括预测在内的原始时间序列数据
n_ahead = 10
preds, conf_int = fit.predict(n_periods = n_ahead, return_conf_int = True)
xrange = pd.date_range(min(datelist), periods = 24 + n_ahead, freq = 'MS')
import matplotlib.pyplot as plt
import matplotlib.dates as dates
fig = plt.figure()
plt.plot(xrange[:df.shape[0]], df["sales"])
plt.plot(xrange[df.shape[0]:], preds)
plt.fill_between(
xrange[df.shape[0]:],
conf_int[:, 0], conf_int[:, 1],
alpha = 0.1, color = 'b')
plt.show()
我正在使用 pmdarima
开发时间序列预测模型。
我的时间序列很短,但表现还不错。以下代码在 sklearn\utils\validation.py
上给出错误from pmdarima import auto_arima
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.stattools import adfuller
import datetime
import pandas as pd
datelist = pd.date_range('2018-01-01', periods=24, freq='MS')
sales = [26.000000,27.100000,26.000000,28.014286,28.057143,
30.128571,39.800000,33.000000,37.971429,45.914286,
37.942857,33.885714,36.285714,34.971429,40.042857,
27.157143,30.685714,35.585714,43.400000,51.357143,
45.628571,49.942857,42.028571,52.714286]
df = pd.DataFrame(data=sales,index=datelist,columns=['sales'])
observations = df['sales']
size = df['sales'].size
shape = df['sales'].shape
maxdate = max(df.index).strftime("%Y-%m-%d")
mindate = min(df.index).strftime("%Y-%m-%d")
asc = seasonal_decompose(df, model='add')
if asc.seasonal[asc.seasonal.notnull()].size == df['sales'].size:
seasonality = True
else:
seasonality = False
# Check Stationarity
aftest = adfuller(df['sales'])
if aftest[1] <= 0.05:
stationarity = True
else:
stationarity = False
results = auto_arima(observations,
seasonal=seasonality,
stationary=stationarity,
m=12,
error_action="ignore")
~\AppData\Roaming\Python\Python37\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
584 " minimum of %d is required%s."
585 % (n_samples, array.shape, ensure_min_samples,
--> 586 context))
587
588 if ensure_min_features > 0 and array.ndim == 2:
ValueError: Found array with 0 sample(s) (shape=(0,)) while a minimum of 1 is required.
但是,如果我将销售系列的第一个值从 26 更改为 30,它就会起作用。
这里可能有什么问题?
您的示例不可重现,因为当前
的错误seasonality
和stationarity
未在全局范围内定义。这导致auto_arima
抛出形式为NameError: name 'seasonality' is not defined
您只有很少的观测值,因此请尝试为不同的 ARIMA 过程显式设置 min/max 阶数值。 IMO,这通常是一种很好的做法。对于您的情况,我们可以做到
fit = auto_arima( observations, start_p = 0, start_q = 0, start_P = 0, start_Q = 0, max_p = 3, max_q = 3, max_P = 3, max_Q = 3, D = 1, max_D = 2, m = 12, seasonal = True, error_action = 'ignore')
这里我们考虑到 MA(3) 和 AR(3) 以及 SMA(3) 和 SAR(3) 的过程。
让我们可视化包括预测在内的原始时间序列数据
n_ahead = 10 preds, conf_int = fit.predict(n_periods = n_ahead, return_conf_int = True) xrange = pd.date_range(min(datelist), periods = 24 + n_ahead, freq = 'MS') import matplotlib.pyplot as plt import matplotlib.dates as dates fig = plt.figure() plt.plot(xrange[:df.shape[0]], df["sales"]) plt.plot(xrange[df.shape[0]:], preds) plt.fill_between( xrange[df.shape[0]:], conf_int[:, 0], conf_int[:, 1], alpha = 0.1, color = 'b') plt.show()