使用 arima 预测具有分钟间隔的时间序列 python
Prediction of time series with minute interval using arima for python
我是时间序列机器学习的初学者,我需要开发一个项目,我的数据由分钟组成,有人可以帮我创建这个算法吗?
数据集:每个值代表一分钟的采集(9:00,9:01 ...),采集持续10分钟,在2月,即一月份有10个值,二月份有10个值。
Objective: 我希望我的结果是对 3 月份接下来 10 分钟的预测,例如:
2020-03-01 9:00:00
2020-03-01 9:01:00
2020-03-01 9:02:00
2020-03-01 9:03:00
训练:训练必须包含一月和二月作为预测的参考,考虑到是时间序列
季节性:
预测:
Current problem: it seems that the current forecast is failing, the
previous data does not seem to be valid as a time series, because, as
can be seen in the seasonality image, the data set is shown as a
straight line. The forecast is represented by the green line in the
figure below, and the original data by the blue line, however as we
see the date axis is going until 2020-11-01, it should go until
2020-03-01, in addition the original data form a rectangle in the
graph
script.py
# -*- coding: utf-8 -*-
try:
import pandas as pd
import numpy as np
import pmdarima as pm
#%matplotlib inline
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.arima_model import ARIMA
from statsmodels.tsa.seasonal import seasonal_decompose
from dateutil.parser import parse
except ImportError as e:
print("[FAILED] {}".format(e))
class operationsArima():
@staticmethod
def ForecastingWithArima():
try:
# Import
data = pd.read_csv('minute.csv', parse_dates=['date'], index_col='date')
# Plot
fig, axes = plt.subplots(2, 1, figsize=(10,5), dpi=100, sharex=True)
# Usual Differencing
axes[0].plot(data[:], label='Original Series')
axes[0].plot(data[:].diff(1), label='Usual Differencing')
axes[0].set_title('Usual Differencing')
axes[0].legend(loc='upper left', fontsize=10)
print("[OK] Generated axes")
# Seasonal
axes[1].plot(data[:], label='Original Series')
axes[1].plot(data[:].diff(11), label='Seasonal Differencing', color='green')
axes[1].set_title('Seasonal Differencing')
plt.legend(loc='upper left', fontsize=10)
plt.suptitle('Drug Sales', fontsize=16)
plt.show()
# Seasonal - fit stepwise auto-ARIMA
smodel = pm.auto_arima(data, start_p=1, start_q=1,
test='adf',
max_p=3, max_q=3, m=11,
start_P=0, seasonal=True,
d=None, D=1, trace=True,
error_action='ignore',
suppress_warnings=True,
stepwise=True)
smodel.summary()
print(smodel.summary())
print("[OK] Generated model")
# Forecast
n_periods = 11
fitted, confint = smodel.predict(n_periods=n_periods, return_conf_int=True)
index_of_fc = pd.date_range(data.index[-1], periods = n_periods, freq='MS')
# make series for plotting purpose
fitted_series = pd.Series(fitted, index=index_of_fc)
lower_series = pd.Series(confint[:, 0], index=index_of_fc)
upper_series = pd.Series(confint[:, 1], index=index_of_fc)
print("[OK] Generated series")
# Plot
plt.plot(data)
plt.plot(fitted_series, color='darkgreen')
plt.fill_between(lower_series.index,
lower_series,
upper_series,
color='k', alpha=.15)
plt.title("ARIMA - Final Forecast - Drug Sales")
plt.show()
print("[SUCESS] Generated forecast")
except Exception as e:
print("[FAILED] Caused by: {}".format(e))
if __name__ == "__main__":
flow = operationsArima()
flow.ForecastingWithArima() # Init script
总结:
SARIMAX Results
================================================================================
Dep. Variable: y No. Observations: 22
Model: SARIMAX(0, 1, 0, 11) Log Likelihood nan
Date: Mon, 13 Apr 2020 AIC nan
Time: 21:19:10 BIC nan
Sample: 0 HQIC nan
- 22
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
intercept 0 5.33e-13 0 1.000 -1.05e-12 1.05e-12
sigma2 1e-10 5.81e-10 0.172 0.863 -1.04e-09 1.24e-09
===================================================================================
Ljung-Box (Q): nan Jarque-Bera (JB): nan
Prob(Q): nan Prob(JB): nan
Heteroskedasticity (H): nan Skew: nan
Prob(H) (two-sided): nan Kurtosis: nan
===================================================================================
我在这里看到了几个问题:由于您有两个短的 1 分钟频率时间序列,间隔一个月,观察您提到的蓝线中的直线是正常的。此外,绿线看起来像原始数据本身,这意味着模型的预测与您的原始数据完全相同。
最后,我认为将两个独立的时间序列放在一起不是一个好主意...
我是时间序列机器学习的初学者,我需要开发一个项目,我的数据由分钟组成,有人可以帮我创建这个算法吗?
数据集:每个值代表一分钟的采集(9:00,9:01 ...),采集持续10分钟,在2月,即一月份有10个值,二月份有10个值。
Objective: 我希望我的结果是对 3 月份接下来 10 分钟的预测,例如:
2020-03-01 9:00:00
2020-03-01 9:01:00
2020-03-01 9:02:00
2020-03-01 9:03:00
训练:训练必须包含一月和二月作为预测的参考,考虑到是时间序列
季节性:
预测:
Current problem: it seems that the current forecast is failing, the previous data does not seem to be valid as a time series, because, as can be seen in the seasonality image, the data set is shown as a straight line. The forecast is represented by the green line in the figure below, and the original data by the blue line, however as we see the date axis is going until 2020-11-01, it should go until 2020-03-01, in addition the original data form a rectangle in the graph
script.py
# -*- coding: utf-8 -*-
try:
import pandas as pd
import numpy as np
import pmdarima as pm
#%matplotlib inline
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.arima_model import ARIMA
from statsmodels.tsa.seasonal import seasonal_decompose
from dateutil.parser import parse
except ImportError as e:
print("[FAILED] {}".format(e))
class operationsArima():
@staticmethod
def ForecastingWithArima():
try:
# Import
data = pd.read_csv('minute.csv', parse_dates=['date'], index_col='date')
# Plot
fig, axes = plt.subplots(2, 1, figsize=(10,5), dpi=100, sharex=True)
# Usual Differencing
axes[0].plot(data[:], label='Original Series')
axes[0].plot(data[:].diff(1), label='Usual Differencing')
axes[0].set_title('Usual Differencing')
axes[0].legend(loc='upper left', fontsize=10)
print("[OK] Generated axes")
# Seasonal
axes[1].plot(data[:], label='Original Series')
axes[1].plot(data[:].diff(11), label='Seasonal Differencing', color='green')
axes[1].set_title('Seasonal Differencing')
plt.legend(loc='upper left', fontsize=10)
plt.suptitle('Drug Sales', fontsize=16)
plt.show()
# Seasonal - fit stepwise auto-ARIMA
smodel = pm.auto_arima(data, start_p=1, start_q=1,
test='adf',
max_p=3, max_q=3, m=11,
start_P=0, seasonal=True,
d=None, D=1, trace=True,
error_action='ignore',
suppress_warnings=True,
stepwise=True)
smodel.summary()
print(smodel.summary())
print("[OK] Generated model")
# Forecast
n_periods = 11
fitted, confint = smodel.predict(n_periods=n_periods, return_conf_int=True)
index_of_fc = pd.date_range(data.index[-1], periods = n_periods, freq='MS')
# make series for plotting purpose
fitted_series = pd.Series(fitted, index=index_of_fc)
lower_series = pd.Series(confint[:, 0], index=index_of_fc)
upper_series = pd.Series(confint[:, 1], index=index_of_fc)
print("[OK] Generated series")
# Plot
plt.plot(data)
plt.plot(fitted_series, color='darkgreen')
plt.fill_between(lower_series.index,
lower_series,
upper_series,
color='k', alpha=.15)
plt.title("ARIMA - Final Forecast - Drug Sales")
plt.show()
print("[SUCESS] Generated forecast")
except Exception as e:
print("[FAILED] Caused by: {}".format(e))
if __name__ == "__main__":
flow = operationsArima()
flow.ForecastingWithArima() # Init script
总结:
SARIMAX Results
================================================================================
Dep. Variable: y No. Observations: 22
Model: SARIMAX(0, 1, 0, 11) Log Likelihood nan
Date: Mon, 13 Apr 2020 AIC nan
Time: 21:19:10 BIC nan
Sample: 0 HQIC nan
- 22
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
intercept 0 5.33e-13 0 1.000 -1.05e-12 1.05e-12
sigma2 1e-10 5.81e-10 0.172 0.863 -1.04e-09 1.24e-09
===================================================================================
Ljung-Box (Q): nan Jarque-Bera (JB): nan
Prob(Q): nan Prob(JB): nan
Heteroskedasticity (H): nan Skew: nan
Prob(H) (two-sided): nan Kurtosis: nan
===================================================================================
我在这里看到了几个问题:由于您有两个短的 1 分钟频率时间序列,间隔一个月,观察您提到的蓝线中的直线是正常的。此外,绿线看起来像原始数据本身,这意味着模型的预测与您的原始数据完全相同。
最后,我认为将两个独立的时间序列放在一起不是一个好主意...