具有 pandas 数据框的 ARIMA 模型

Question

我有以下数据集test_1

Date    Frequency
0   2020-01-20  10
1   2020-01-21  2
2   2020-01-22  1
3   2020-01-23  10
4   2020-01-24  6
... ... ...
74  2020-04-04  7
75  2020-04-05  9
76  2020-04-06  8
77  2020-04-07  6
78  2020-04-08  1

其中 Frequency 是按日期计算的用户频率列。

我想预测未来趋势，为此我正在考虑使用 ARIMA 模型。我用过这个代码

# fit model
model = ARIMA(test_1, order=(5,1,0))
model_fit = model.fit(disp=0)
print(model_fit.summary())
# plot residual errors
residuals = DataFrame(model_fit.resid)
residuals.plot()
pyplot.show()
residuals.plot(kind='kde')
pyplot.show()
print(residuals.describe())

但我遇到了这个错误：ValueError: Pandas data cast to numpy dtype of object. Check input data with np.asarray(data).

由于 model = ARIMA(test_1, order=(5,1,0)).

你知道这意味着什么吗？我该如何解决它？

Answer 1

此错误表明 ARIMA 需要一个类似数组的对象，但您传递的是 DataFrame。

这可以通过 test_1["Frequency"] 而不是 test_1 来解决。此外，我将修复我在您的代码中遇到的其他一些问题：

import pandas as pd
from statsmodels.tsa.arima_model import ARIMA
import matplotlib.pyplot as pyplot

# fit model
model = ARIMA(test_1["Frequency"], order=(5,1,0)) #<--- change this
model_fit = model.fit(disp=0)
print(model_fit.summary())
# plot residual errors
residuals = pd.DataFrame(model_fit.resid)
residuals.plot(kind='kde')
print(residuals.describe())
pyplot.show()

具有 pandas 数据框的 ARIMA 模型

ARIMA model with pandas dataframe

python

time-series

pandas

arima