ARIMA 模型中的负结果

Negative results in ARIMA model

我正在尝试通过学习上个月来预测月底的每日收入。由于工作日和周末的收入行为不同,我决定在 Python.

中使用时间序列模型 (ARIMA)

这是我正在使用的 Python 代码:

import itertools
import pandas as pd
import numpy as np
from datetime import datetime, date, timedelta
import statsmodels.api as sm
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
import calendar

data_temp = [['01/03/2020',53921.785],['02/03/2020',97357.9595],['03/03/2020',95353.56893],['04/03/2020',93319.6761999999],['05/03/2020',88835.79958],['06/03/2020',98733.0856000001],['07/03/2020',61501.03036],['08/03/2020',74710.00968],['09/03/2020',156613.20712],['10/03/2020',131533.9006],['11/03/2020',108037.3002],['12/03/2020',106729.43067],['13/03/2020',125724.79704],['14/03/2020',79917.6726599999],['15/03/2020',90889.87192],['16/03/2020',160107.93834],['17/03/2020',144987.72243],['18/03/2020',146793.40641],['19/03/2020',145040.69416],['20/03/2020',140467.50472],['21/03/2020',69490.18814],['22/03/2020',82753.85331],['23/03/2020',142765.14863],['24/03/2020',121446.77825],['25/03/2020',107035.29359],['26/03/2020',98118.19468],['27/03/2020',82054.8721099999],['28/03/2020',61249.91097],['29/03/2020',72435.6711699999],['30/03/2020',127725.50818],['31/03/2020',77973.61724]] 
panel = pd.DataFrame(data_temp, columns = ['Date', 'revenue'])

pred_result=pd.DataFrame(columns=['revenue'])
panel['Date']=pd.to_datetime(panel['Date'])
panel.set_index('Date', inplace=True)
ts = panel['revenue']

p = d = q = range(0, 2)
pdq = list(itertools.product(p, d, q))

seasonal_pdq = [(x[0], x[1], x[2], 7) for x in list(itertools.product(p, d, q))]
aic = float('inf')
for es in [True,False]:
    for param in pdq:
      for param_seasonal in seasonal_pdq:
        try:
          mod = sm.tsa.statespace.SARIMAX(ts,
                                          order=param,
                                          seasonal_order=param_seasonal,
                                          enforce_stationarity=es,
                                          enforce_invertibility=False)
          results = mod.fit()
          if results.aic<aic:
            param1=param
            param2=param_seasonal
            aic=results.aic
            es1=es
          #print('ARIMA{}x{} enforce_stationarity={} - AIC:{}'.format(param, param_seasonal,es,results.aic))
        except:
          continue
print('Best model parameters: ARIMA{}x{} - AIC:{} enforce_stationarity={}'.format(param1, param2, aic,es1))

mod = sm.tsa.statespace.SARIMAX(ts,
                                order=param1,
                                seasonal_order=param2,
                                enforce_stationarity=es1,
                                enforce_invertibility=False)
results = mod.fit()

pred_uc = results.get_forecast(steps=calendar.monthrange(datetime.now().year,datetime.now().month)[1]-datetime.now().day+1)
pred_ci = pred_uc.conf_int()
ax = ts.plot(label='observed', figsize=(12, 5))
pred_uc.predicted_mean.plot(ax=ax, label='Forecast')
ax.fill_between(pred_ci.index,
                pred_ci.iloc[:, 0],
                pred_ci.iloc[:, 1], color='k', alpha=.25)
ax.set_xlabel('Date')
plt.legend()
plt.show()

predict=pred_uc.predicted_mean.to_frame()
predict.reset_index(inplace=True)
predict.rename(columns={'index': 'date',0: 'revenue_forcast'}, inplace=True)
display(predict)

输出如下:

如何看到预测结果由于负斜率而具有负值。

因为我是预测收入,所以结果不能低于零,而且负斜率看起来也很奇怪。

我的方法有什么问题吗? 我该如何改进它?

您不能强制 ARIMA 模型只取正值。但是,当您想预测始终为正的值时,经典的 'trick' 是使用将正值转换为 R 中的任何值的函数。log 函数就是一个很好的例子。

panel['log_revenue'] = np.log(panel['revenue'])

现在预测 log_revenue 列。

现在,如果预测值为负,那没关系,因为您的预测实际上是 np.exp(predict),这是正值。

解决办法是获取更多的历史数据。