无法使用 ETS 进行预测

Failure to predict with ETS

大家早上好。我正在尝试使用 ETS 进行预测。

我有以下代码:

from sktime.forecasting.ets import AutoETS


datos = [21.5294, 21.5228, 21.5289, 21.5096, 21.506, 21.5119, 21.5173, 21.5308, 21.5355, 21.5181, 21.5, 21.4972, 21.5067, 21.5149, 21.4994, 21.4967, 21.4774, 21.4662, 21.4752, 21.4858, 21.4581, 21.4398, 21.4385, 21.4471, 21.4399, 21.444, 21.4555, 21.4366, 21.4402, 21.4371, 21.4317, 21.4342, 21.411, 21.4174, 21.4149, 21.4151, 21.4186, 21.4411, 21.4569, 21.4628, 21.448, 21.4468, 21.4357, 21.4329, 21.4543, 21.4429, 21.4478, 21.4423, 21.4536, 21.4416, 21.4384, 21.4378, 21.4622, 21.4413, 21.4315, 21.4419, 21.4323, 21.429, 21.4103, 21.4194, 21.4364, 21.4245, 21.4348, 21.4276, 21.4113, 21.4235, 21.407, 21.412, 21.4263, 21.431, 21.4362, 21.432, 21.4445, 21.4487, 21.4623, 21.4766, 21.4785, 21.4891, 21.4869, 21.4903, 21.4839, 21.4856, 21.4909, 21.5048, 21.5005, 21.4905, 21.4906, 21.4914, 21.5052, 21.4898, 21.5232, 21.5234, 21.5086, 21.5108, 21.5017, 21.5141, 21.5055, 21.4953, 21.4618, 21.4504, 21.4667, 21.4602, 21.453, 21.4497, 21.4446, 21.4308, 21.4347, 21.4512, 21.4675, 21.4675, 21.465, 21.4624, 21.4682, 21.472, 21.4632, 21.4644, 21.4615, 21.4604, 21.4679, 21.4672]
indice = pd.date_range("2020-10-31 23:57:00", periods=len(datos), freq="T")

datos = pd.Series(data=datos, index=indice)

datos = datos.asfreq(freq='T')


pasado = datos[:100]
futuro = datos[100:]


model_auto = AutoETS(auto=True, initialization_method='heuristic', allow_multiplicative_trend=True, n_jobs=-1, sp=10)
model_auto.fit(pasado)


lista = list(np.array(range(20))+1)
prediccion = model_auto.predict(lista)

#print(pasado)
#print(futuro)
#print(prediccion)

pasado.plot()
futuro.plot()
prediccion.plot()
plt.show()

结果如下:

Predict

蓝线对应于我用来训练模型的数据。

橙色线对应'future'数据

绿线对应预测,应该接近橙线。

我不知道为什么预测总是相同的值。

我想知道你对此的看法。你知道为什么这个预测会出现这种情况,我该如何修正?

谢谢。

这不是错误...我不是该主题的专家,但简短的回答是:“这是由于您拥有的数据集”.

最好用一个例子来回答详细问题...想象一下您有另一组数据。如果您同意,他们可以是:

datos = [
    30.05251300, 19.14849600, 25.31769200, 27.59143700,
    32.07645600, 23.48796100, 28.47594000, 35.12375300,
    36.83848500, 25.00701700, 30.72223000, 28.69375900,
    36.64098600, 23.82460900, 29.31168300, 31.77030900,
    35.17787700, 19.77524400, 29.60175000, 34.53884200,
    41.27359900, 26.65586200, 28.27985900, 35.19115300,
    42.20566386, 24.64917133, 32.66733514, 37.25735401,
    45.24246027, 29.35048127, 36.34420728, 41.78208136,
    49.27659843, 31.27540139, 37.85062549, 38.83704413,
    51.23690034, 31.83855162, 41.32342126, 42.79900337,
    55.70835836, 33.40714492, 42.31663797, 45.15712257,
    59.57607996, 34.83733016, 44.84168072, 46.97124960,
    60.01903094, 38.37117851, 46.97586413, 50.73379646,
    61.64687319, 39.29956937, 52.67120908, 54.33231689,
    66.83435838, 40.87118847, 51.82853579, 57.49190993,
    65.25146985, 43.06120822, 54.76075713, 59.83447494,
    73.25702747, 47.69662373, 61.09776802, 66.05576122]

indice = pd.date_range("2020-10-31 23:57:00", periods=len(datos), freq="T")

datos = pd.Series(data=datos, index=indice)
        
datos = datos.asfreq(freq='T')

这样你就会得到类似这样的代码:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.exponential_smoothing.ets import ETSModel
    
datos = [
        30.05251300, 19.14849600, 25.31769200, 27.59143700,
        32.07645600, 23.48796100, 28.47594000, 35.12375300,
        36.83848500, 25.00701700, 30.72223000, 28.69375900,
        36.64098600, 23.82460900, 29.31168300, 31.77030900,
        35.17787700, 19.77524400, 29.60175000, 34.53884200,
        41.27359900, 26.65586200, 28.27985900, 35.19115300,
        42.20566386, 24.64917133, 32.66733514, 37.25735401,
        45.24246027, 29.35048127, 36.34420728, 41.78208136,
        49.27659843, 31.27540139, 37.85062549, 38.83704413,
        51.23690034, 31.83855162, 41.32342126, 42.79900337,
        55.70835836, 33.40714492, 42.31663797, 45.15712257,
        59.57607996, 34.83733016, 44.84168072, 46.97124960,
        60.01903094, 38.37117851, 46.97586413, 50.73379646,
        61.64687319, 39.29956937, 52.67120908, 54.33231689,
        66.83435838, 40.87118847, 51.82853579, 57.49190993,
        65.25146985, 43.06120822, 54.76075713, 59.83447494,
        73.25702747, 47.69662373, 61.09776802, 66.05576122]
    
indice = pd.date_range("2020-10-31 23:57:00", periods=len(datos), freq="T")
    
datos = pd.Series(data=datos, index=indice)
    
datos = datos.asfreq(freq='T')
          
          
pasado = datos[:48]
futuro = datos[47:]

              
modelo = ETSModel(datos, error="add", trend="add", seasonal="add",
                    damped_trend=True, seasonal_periods=4)
#modelo_fit = modelo.fit(maxiter=10000)
fit = modelo.fit()
    
print(fit.summary())
    
pred = fit.get_prediction(start='2020-11-01 00:44:00', end='2020-11-01 01:04:00')
    
df = pred.summary_frame(alpha=0.05)
    
    
simulated = fit.simulate(anchor="end", nsimulations=10, repetitions=100)

for i in range(simulated.shape[1]):
  simulated.iloc[:,i].plot(label='_', color='gray', alpha=0.1)
      
df["mean"].plot(label='mean prediction')
df["pi_lower"].plot(linestyle='--', color='tab:cyan', label='95% interval')
df["pi_upper"].plot(linestyle='--', color='tab:cyan', label='_')

pred.endog.plot(label='data')
plt.legend()
plt.show()

你会得到这种类型的结果:

您的数据以橙色表示。 ETS 模型估计蓝色数据的平均值,以及数据根据平均值变化的范围(间断的青色线)。然后(在预测中)模型执行模拟尝试预测,向前 10 步,并进行 100 次尝试(它们是灰线)。

在这种特殊情况下,模型非常适合数据......当然!这是一个教科书的例子,所以它会完美地工作 - 在日常实践中理论是不同的。

虽然您使用了另一个库,但通常它可以解释您得到结果的原因。

用于预测的 ETS 模型有几个可用的函数:

  • 预测:根据样本进行预测
  • 预测:样本内和样本外预测
  • 模拟:运行状态模拟space模型
  • get_prediction:样本内和样本外预测,以及预测区间。

在你的例子中,数据是随机的,因为在模型眼中缺少另一个词,这个特定模型很难生成或决定数据未来的去向,所以它估计了一个平均值,上限和下限。数据可能在未来。

让我们使用相同的代码,只是改变数据,你会得到这样的结果:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.exponential_smoothing.ets import ETSModel

    
pasado = datos[:100]
futuro = datos[99:]
print(futuro)
        
modelo = ETSModel(datos, error="add", trend="add", seasonal="add",
              damped_trend=True, seasonal_periods=4)
#modelo_fit = modelo.fit(maxiter=10000)
fit = modelo.fit()

print(fit.summary())

#prediccion = modelo_fit.get_prediction(start='2020-11-01 01:37:00', end='2020-11-01 01:57:00')
pred = fit.get_prediction(start='2020-11-01 01:36:00', end='2020-11-01 01:56:00')

df = pred.summary_frame(alpha=0.05)




simulated = fit.simulate(anchor="end", nsimulations=20, repetitions=100)
for i in range(simulated.shape[1]):
  simulated.iloc[:,i].plot(label='_', color='gray', alpha=0.1)


df["mean"].plot(label='mean prediction')
df["pi_lower"].plot(linestyle='--', color='tab:cyan', label='95% interval')
df["pi_upper"].plot(linestyle='--', color='tab:cyan', label='_')
pred.endog.plot(label='data')

pasado.plot(label='Pasado')
futuro.plot(label='Futuro')



plt.legend()
plt.show()

在训练数据(绿色)之后,有一种气泡(青色虚线之间包含的内容),它是(根据模型)对数据可能位于何处的估计未来 ,所以通常以相同值出现在您面前的线是模型预测的未来值的估计平均值。换句话说,根据数据,模型不能精确地调整到你未来变量中的数据。

可以(绝对...也许)更好地适应数据的模型可以是 SARIMA 或 SARIMAX,最好搜索(对于以前的情况)一些适合值顺序的机制/库 = ( p, d, q) 和 seasonal_order = (P, D, Q, s) 自动(尽管计算成本可能开始上升)。

当然还有很多模型,Mathematica有一个功能(暂时想不起来了),它会寻找最适合数据的模型和参数集。也许 Python 某处有类似的东西 — 如果是这样,我很想听听