ARIMA 模型中的负结果
Negative results in ARIMA model
我正在尝试通过学习上个月来预测月底的每日收入。由于工作日和周末的收入行为不同,我决定在 Python.
中使用时间序列模型 (ARIMA)
这是我正在使用的 Python 代码:
import itertools
import pandas as pd
import numpy as np
from datetime import datetime, date, timedelta
import statsmodels.api as sm
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
import calendar
data_temp = [['01/03/2020',53921.785],['02/03/2020',97357.9595],['03/03/2020',95353.56893],['04/03/2020',93319.6761999999],['05/03/2020',88835.79958],['06/03/2020',98733.0856000001],['07/03/2020',61501.03036],['08/03/2020',74710.00968],['09/03/2020',156613.20712],['10/03/2020',131533.9006],['11/03/2020',108037.3002],['12/03/2020',106729.43067],['13/03/2020',125724.79704],['14/03/2020',79917.6726599999],['15/03/2020',90889.87192],['16/03/2020',160107.93834],['17/03/2020',144987.72243],['18/03/2020',146793.40641],['19/03/2020',145040.69416],['20/03/2020',140467.50472],['21/03/2020',69490.18814],['22/03/2020',82753.85331],['23/03/2020',142765.14863],['24/03/2020',121446.77825],['25/03/2020',107035.29359],['26/03/2020',98118.19468],['27/03/2020',82054.8721099999],['28/03/2020',61249.91097],['29/03/2020',72435.6711699999],['30/03/2020',127725.50818],['31/03/2020',77973.61724]]
panel = pd.DataFrame(data_temp, columns = ['Date', 'revenue'])
pred_result=pd.DataFrame(columns=['revenue'])
panel['Date']=pd.to_datetime(panel['Date'])
panel.set_index('Date', inplace=True)
ts = panel['revenue']
p = d = q = range(0, 2)
pdq = list(itertools.product(p, d, q))
seasonal_pdq = [(x[0], x[1], x[2], 7) for x in list(itertools.product(p, d, q))]
aic = float('inf')
for es in [True,False]:
for param in pdq:
for param_seasonal in seasonal_pdq:
try:
mod = sm.tsa.statespace.SARIMAX(ts,
order=param,
seasonal_order=param_seasonal,
enforce_stationarity=es,
enforce_invertibility=False)
results = mod.fit()
if results.aic<aic:
param1=param
param2=param_seasonal
aic=results.aic
es1=es
#print('ARIMA{}x{} enforce_stationarity={} - AIC:{}'.format(param, param_seasonal,es,results.aic))
except:
continue
print('Best model parameters: ARIMA{}x{} - AIC:{} enforce_stationarity={}'.format(param1, param2, aic,es1))
mod = sm.tsa.statespace.SARIMAX(ts,
order=param1,
seasonal_order=param2,
enforce_stationarity=es1,
enforce_invertibility=False)
results = mod.fit()
pred_uc = results.get_forecast(steps=calendar.monthrange(datetime.now().year,datetime.now().month)[1]-datetime.now().day+1)
pred_ci = pred_uc.conf_int()
ax = ts.plot(label='observed', figsize=(12, 5))
pred_uc.predicted_mean.plot(ax=ax, label='Forecast')
ax.fill_between(pred_ci.index,
pred_ci.iloc[:, 0],
pred_ci.iloc[:, 1], color='k', alpha=.25)
ax.set_xlabel('Date')
plt.legend()
plt.show()
predict=pred_uc.predicted_mean.to_frame()
predict.reset_index(inplace=True)
predict.rename(columns={'index': 'date',0: 'revenue_forcast'}, inplace=True)
display(predict)
输出如下:
如何看到预测结果由于负斜率而具有负值。
因为我是预测收入,所以结果不能低于零,而且负斜率看起来也很奇怪。
我的方法有什么问题吗?
我该如何改进它?
您不能强制 ARIMA 模型只取正值。但是,当您想预测始终为正的值时,经典的 'trick' 是使用将正值转换为 R 中的任何值的函数。log
函数就是一个很好的例子。
panel['log_revenue'] = np.log(panel['revenue'])
现在预测 log_revenue
列。
现在,如果预测值为负,那没关系,因为您的预测实际上是 np.exp(predict)
,这是正值。
解决办法是获取更多的历史数据。
我正在尝试通过学习上个月来预测月底的每日收入。由于工作日和周末的收入行为不同,我决定在 Python.
中使用时间序列模型 (ARIMA)这是我正在使用的 Python 代码:
import itertools
import pandas as pd
import numpy as np
from datetime import datetime, date, timedelta
import statsmodels.api as sm
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
import calendar
data_temp = [['01/03/2020',53921.785],['02/03/2020',97357.9595],['03/03/2020',95353.56893],['04/03/2020',93319.6761999999],['05/03/2020',88835.79958],['06/03/2020',98733.0856000001],['07/03/2020',61501.03036],['08/03/2020',74710.00968],['09/03/2020',156613.20712],['10/03/2020',131533.9006],['11/03/2020',108037.3002],['12/03/2020',106729.43067],['13/03/2020',125724.79704],['14/03/2020',79917.6726599999],['15/03/2020',90889.87192],['16/03/2020',160107.93834],['17/03/2020',144987.72243],['18/03/2020',146793.40641],['19/03/2020',145040.69416],['20/03/2020',140467.50472],['21/03/2020',69490.18814],['22/03/2020',82753.85331],['23/03/2020',142765.14863],['24/03/2020',121446.77825],['25/03/2020',107035.29359],['26/03/2020',98118.19468],['27/03/2020',82054.8721099999],['28/03/2020',61249.91097],['29/03/2020',72435.6711699999],['30/03/2020',127725.50818],['31/03/2020',77973.61724]]
panel = pd.DataFrame(data_temp, columns = ['Date', 'revenue'])
pred_result=pd.DataFrame(columns=['revenue'])
panel['Date']=pd.to_datetime(panel['Date'])
panel.set_index('Date', inplace=True)
ts = panel['revenue']
p = d = q = range(0, 2)
pdq = list(itertools.product(p, d, q))
seasonal_pdq = [(x[0], x[1], x[2], 7) for x in list(itertools.product(p, d, q))]
aic = float('inf')
for es in [True,False]:
for param in pdq:
for param_seasonal in seasonal_pdq:
try:
mod = sm.tsa.statespace.SARIMAX(ts,
order=param,
seasonal_order=param_seasonal,
enforce_stationarity=es,
enforce_invertibility=False)
results = mod.fit()
if results.aic<aic:
param1=param
param2=param_seasonal
aic=results.aic
es1=es
#print('ARIMA{}x{} enforce_stationarity={} - AIC:{}'.format(param, param_seasonal,es,results.aic))
except:
continue
print('Best model parameters: ARIMA{}x{} - AIC:{} enforce_stationarity={}'.format(param1, param2, aic,es1))
mod = sm.tsa.statespace.SARIMAX(ts,
order=param1,
seasonal_order=param2,
enforce_stationarity=es1,
enforce_invertibility=False)
results = mod.fit()
pred_uc = results.get_forecast(steps=calendar.monthrange(datetime.now().year,datetime.now().month)[1]-datetime.now().day+1)
pred_ci = pred_uc.conf_int()
ax = ts.plot(label='observed', figsize=(12, 5))
pred_uc.predicted_mean.plot(ax=ax, label='Forecast')
ax.fill_between(pred_ci.index,
pred_ci.iloc[:, 0],
pred_ci.iloc[:, 1], color='k', alpha=.25)
ax.set_xlabel('Date')
plt.legend()
plt.show()
predict=pred_uc.predicted_mean.to_frame()
predict.reset_index(inplace=True)
predict.rename(columns={'index': 'date',0: 'revenue_forcast'}, inplace=True)
display(predict)
输出如下:
如何看到预测结果由于负斜率而具有负值。
因为我是预测收入,所以结果不能低于零,而且负斜率看起来也很奇怪。
我的方法有什么问题吗? 我该如何改进它?
您不能强制 ARIMA 模型只取正值。但是,当您想预测始终为正的值时,经典的 'trick' 是使用将正值转换为 R 中的任何值的函数。log
函数就是一个很好的例子。
panel['log_revenue'] = np.log(panel['revenue'])
现在预测 log_revenue
列。
现在,如果预测值为负,那没关系,因为您的预测实际上是 np.exp(predict)
,这是正值。
解决办法是获取更多的历史数据。