无法使用新观察更新 StatsModels SARIMAX (ValueError)

Question

我正在尝试运行使用 SciKitLearn 的 TimeSeriesSplit() 对时间序列数据集进行样本外验证以创建 train/test 折叠。

想法是在训练折叠上训练 Statsmodel 的 SARIMAX，然后在测试折叠上进行验证，而无需重新拟合模型。为此，我们必须在预测之前一次一个地迭代地将来自测试折叠的新观察值附加到模型中。

但是，我在该追加步骤中收到 ValueError： ValueError: Given `endog` does not have an index that extends the index of the model.

这对我来说毫无意义。如果我为每个折叠打印出 print(max(train_fold.index), min(test_fold.index))，显然火车折叠的最后一个索引低于测试折叠的第一个索引。就我而言：

1983-05 1983-06
1984-05 1984-06
1985-05 1985-06
1986-05 1986-06
1987-05 1987-06

这是目前的完整代码。我确定我在做一些愚蠢的事情，但我被卡住了：

# Create a generator that yields the indices of our train and test folds
split = TimeSeriesSplit(n_splits=5).split(train_series)

# Loop through each fold
for train_idcs, test_idcs in split:

    # Create an empty prediction list to append to
    predictions = []

    # Create the folds
    train_fold = train_series[train_idcs]
    test_fold = train_series[test_idcs]

    # Fit the model on the training fold
    model_instance = sm.tsa.statespace.SARIMAX(
        train_fold,
        order=(1, 0, 0),
        seasonal_order=(1, 0, 0, 12),
        simple_differencing=True,
        enforce_stationarity=False,
        enforce_invertibility=False,
    )
    model_fitted = model_instance.fit(disp=False)

    # Create the initial prediction
    pred = model_fitted.forecast(steps=1)[
        0
    ]  # Slice so we just get the forecast value only
    predictions.append(pred)

    # Now loop through the test set, adding observations individually,
    # and getting the next prediction
    for i in range(len(test_fold)):

        # Get the next row
        next_row = test_fold.iloc[
            i : i + 1
        ]  # Returns single row but in series form (which statsmodels expects)

        # Append the row to the model
        model_fitted.append(next_row, refit=False)

        # Get the new prediction
        pred = model_fitted.forecast(steps=1)[
            0
        ]  # Slice so we just get the forecast value only
        predictions.append(pred)

    print(predictions)

model_fitted.append(next_row, refit=False)是故障点。有任何想法吗？谢谢！

Answer 1

知道了！太傻了。

SARIMAX模型的.append()方法returns模型本身而不是改变模型中存储的数据。

所以正确的代码很简单： model_fitted = model_fitted.append(next_row, refit=False)

无法使用新观察更新 StatsModels SARIMAX (ValueError)

Can't update StatsModels SARIMAX with new observation (ValueError)

python

time-series

pandas

statsmodels

arima