statsmodel ARIMA 的不切实际的均方误差

Question

前言：我不知道我在做什么。

对于 uni 统计 class 我们必须在 python 中进行一些时间序列预测。

我基本上遵循了本教程，但使用了我的数据：https://www.digitalocean.com/community/tutorials/a-guide-to-time-series-forecasting-with-arima-in-python-3

除 MSE 外，一切正常。

绘制所有内容时，它看起来像这样：

这是我用于 MSE 的数据：

原始数据（交易['2016-05-01':]）：

DATE_BOOKING
2016-05-01    11327.548387
2016-06-01    11534.000000
2016-07-01    11391.677419
2016-08-01    11259.451613
2016-09-01    11968.366667
2016-10-01     7844.387097
2016-11-01     6270.800000
2016-12-01     5103.516129
2017-01-01     4631.032258
2017-02-01     5092.928571
2017-03-01     7800.258065
2017-04-01     8359.133333
2017-05-01     9495.062500

预测（预测）数据（pred.predicted_mean）：

DATE_BOOKING
2016-05-01     9375.120610
2016-06-01    11038.420268
2016-07-01    11571.006853
2016-08-01    10856.183244
2016-09-01    10148.262512
2016-10-01     9433.060067
2016-11-01     7044.780142
2016-12-01     5037.930509
2017-01-01     5337.963486
2017-02-01     5767.081120
2017-03-01     6616.610224
2017-04-01     9389.836132
2017-05-01    10258.791544

我正在按以下方式计算 MSE：

transactions_forecasted = pred.predicted_mean
transactions_truth = transactions['2016-05-01':]
mse = ((transactions_forecasted - transactions_truth) ** 2).mean()
print('The Mean Squared Error of our forecasts is {}'.format(round(mse, 2)))
print('The Root Mean Squared Error of our forecasts is {}'.format(round(np.sqrt(mse), 2)))

这是结果：
我们预测的均方误差为 1130250.12
我们预测的均方根误差为 1063.13

与我在谷歌上搜索过的其他 MSE 相比，它似乎高得离谱。

你能告诉我我做错了什么吗？

如果需要，我可以 post 更多（全部）代码。

提前致谢！

Answer 1

不能跨数据集比较均方误差，因为它的大小取决于数据集的单位。因此，您无法将此处获得的 MSE 与您在使用其他数据的示例问题中看到的 MSE 进行比较。

判断您获得的 MSE 值是否合理的一种方法是查看均方根误差，它在原始数据集的范围内。大约是 1000，平均看来预测值与真实值相差大约 1000。

（第二部分有点简化，因为 RMSE 对大错误的惩罚比对小错误的惩罚更多，但它可以让您大致检查您得到的值是否在大概范围内）。

statsmodel ARIMA 的不切实际的均方误差

Unrealistic Mean Squared Error with statsmodel ARIMA

python

mse

statsmodels

arima