绘制 Keras 股票市场预测神经网络的结果
Graphing The Results Of A Keras Stock Market Predictive Neural Network
我最近尝试完成一个神经网络来预测股票市场上个股价格的波动,利用Keras作为神经网络的框架和Quandl作为检索历史股票价格的数据库;该程序的代码是在Google Colaboratory集成开发环境中完成的,程序显示如下:
import tensorflow as tf
import keras
import numpy as np
import quandl
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import pandas as pd
df = quandl.get("WIKI/FB", api_key = '_msxC6xspj2ddytz7-4u')
print(df)
df = df.reset_index()
df = df[['Adj. Close', 'Date']]
forecast_out = 1
df['Prediction'] = df[['Adj. Close']].shift(-(forecast_out))
X = np.array(df.drop(['Prediction'], 1))
X = X[:-forecast_out]
y = np.array(df['Prediction'])
y = y[:-forecast_out]
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)
model = keras.models.Sequential()
model.add(keras.layers.Dense(units = 64, activation = 'relu'))
model.add(keras.layers.Dense(units = 1, activation = 'linear'))
model.compile(loss='mean_absolute_error',
optimizer='adam',
metrics=['accuracy'])
History = model.fit(x_train, y_train, epochs=8)
prediction = model.predict(x_test)
我的主要询问是关于上述数据的图形机制的存在,允许人们在与该特定数据集的预测相同的图形上显示 x_test 模块;由于我在 Python 这个应用程序中的经验不多,我尝试通过以下命令绘制所提供的数据集:
plt.plot(x_test)
plt.plot(prediction)
然而,这产生了下图:
该程序的主要目的是生成一个系统,该系统能够预测特定时间段内特定股票的任何一组价格;因此,有必要产生与文章最后部分显示的结果类似的结果,如下所示:
https://towardsdatascience.com/neural-networks-to-predict-the-market-c4861b649371
与显示的图表类似的图表可以更透明地分析程序的功效;我的询问是针对生成与上述文章中显示的图表类似的图表的努力。是否还有一种方法可以生成这样的图表或允许观察这样的具体结果?感谢您的协助。
需要注意的重要一点是,您的训练数据和测试数据必须位于 x 轴的不同部分。
例如,假设训练集包含 100 个观察值,测试集包含 15 个观察值。测试集是模型用于预测的时间序列的后半部分(即使用训练集构建的模型)。
考虑使用 LSTM 预测波动的示例 weekly hotel cancellations。
训练和验证预测是使用 MinMaxScaler 生成的,以允许神经网络正确解释数据。据我所知,您没有在您的示例中执行此步骤。您应该这样做,否则您的结果很可能是错误的 - 您的数据不是通用的,因此 LSTM 模型无法正确解释它。
# Generate predictions
trainpred = model.predict(X_train)
valpred = model.predict(X_val)
In [30]:
trainpred
Out[30]:
array([[0.32363528],
[0.3715328 ],
[0.46051228],
[0.35137814],
[0.38220662],
[0.41239697],
[0.3573438 ],
[0.43657327],
[0.47494155],
[0.467317 ],
[0.49233937],
[0.49879026],
[0.39996487],
[0.38200712],
[0.3309482 ],
[0.21176702],
[0.22578238],
[0.18523258],
[0.23222469],
[0.26659006],
[0.2368085 ],
[0.22137557],
[0.28356454],
[0.16753006],
[0.16966385],
[0.22060908],
[0.1916717 ],
[0.2181809 ],
[0.21772115],
[0.24777801],
[0.3288507 ],
[0.30944437],
[0.33784014],
[0.37927932],
[0.31557906],
[0.43595707],
[0.3505273 ],
[0.4064384 ],
[0.48314226],
[0.41506904],
[0.48799258],
[0.4533432 ],
[0.45297146],
[0.46697432],
[0.41320056],
[0.45331544],
[0.48461175],
[0.50513804],
[0.50340337],
[0.44235045],
[0.48495632],
[0.32804203],
[0.38383847],
[0.3502031 ],
[0.34179717],
[0.37928385],
[0.3852548 ],
[0.3978842 ],
[0.41324353],
[0.42388642],
[0.43424374],
[0.4359951 ],
[0.49112016],
[0.49098223],
[0.50581044],
[0.5686604 ],
[0.48814237],
[0.5679423 ],
[0.519874 ],
[0.42899352],
[0.4314267 ],
[0.3878218 ],
[0.3585053 ],
[0.31897143]], dtype=float32)
In [31]:
valpred
Out[31]:
array([[0.374565 ],
[0.311441 ],
[0.37602562],
[0.36187553],
[0.35613692],
[0.399751 ],
[0.40736055],
[0.41798282],
[0.36257237],
[0.4636013 ],
[0.47177172],
[0.45880812],
[0.5725181 ],
[0.5696718 ]], dtype=float32)
预测值转换回正常值:
# Convert predictions back to normal values
trainpred = scaler.inverse_transform(trainpred)
Y_train = scaler.inverse_transform([Y_train])
valpred = scaler.inverse_transform(valpred)
Y_val = scaler.inverse_transform([Y_val])
predictions = valpred
然后绘制预测:
In [34]:
# Train predictions
trainpredPlot = np.empty_like(df)
trainpredPlot[:, :] = np.nan
trainpredPlot[previous:len(trainpred)+previous, :] = trainpred
In [35]:
# Validation predictions
valpredPlot = np.empty_like(df)
valpredPlot[:, :] = np.nan
valpredPlot[len(trainpred)+(previous*2)+1:len(df)-1, :] = valpred
In [36]:
# Plot all predictions
inversetransform, =plt.plot(scaler.inverse_transform(df))
trainpred, =plt.plot(trainpredPlot)
valpred, =plt.plot(valpredPlot)
plt.xlabel('Number of weeks')
plt.ylabel('Cancellations')
plt.title("Predicted vs. Actual Cancellations Per Week")
plt.show()
图表现在显示如下:
总结两点:
确保在绘制真实数据时 - 训练和测试预测不重叠。这是错误的,因为训练和测试预测指的是两组不同的预测。
在输入 LSTM 之前缩放您的数据 - 否则神经网络将不知道如何解释此类数据,任何结果都将非常肤浅。
我最近尝试完成一个神经网络来预测股票市场上个股价格的波动,利用Keras作为神经网络的框架和Quandl作为检索历史股票价格的数据库;该程序的代码是在Google Colaboratory集成开发环境中完成的,程序显示如下:
import tensorflow as tf
import keras
import numpy as np
import quandl
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import pandas as pd
df = quandl.get("WIKI/FB", api_key = '_msxC6xspj2ddytz7-4u')
print(df)
df = df.reset_index()
df = df[['Adj. Close', 'Date']]
forecast_out = 1
df['Prediction'] = df[['Adj. Close']].shift(-(forecast_out))
X = np.array(df.drop(['Prediction'], 1))
X = X[:-forecast_out]
y = np.array(df['Prediction'])
y = y[:-forecast_out]
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)
model = keras.models.Sequential()
model.add(keras.layers.Dense(units = 64, activation = 'relu'))
model.add(keras.layers.Dense(units = 1, activation = 'linear'))
model.compile(loss='mean_absolute_error',
optimizer='adam',
metrics=['accuracy'])
History = model.fit(x_train, y_train, epochs=8)
prediction = model.predict(x_test)
我的主要询问是关于上述数据的图形机制的存在,允许人们在与该特定数据集的预测相同的图形上显示 x_test 模块;由于我在 Python 这个应用程序中的经验不多,我尝试通过以下命令绘制所提供的数据集:
plt.plot(x_test)
plt.plot(prediction)
然而,这产生了下图:
该程序的主要目的是生成一个系统,该系统能够预测特定时间段内特定股票的任何一组价格;因此,有必要产生与文章最后部分显示的结果类似的结果,如下所示:
https://towardsdatascience.com/neural-networks-to-predict-the-market-c4861b649371
与显示的图表类似的图表可以更透明地分析程序的功效;我的询问是针对生成与上述文章中显示的图表类似的图表的努力。是否还有一种方法可以生成这样的图表或允许观察这样的具体结果?感谢您的协助。
需要注意的重要一点是,您的训练数据和测试数据必须位于 x 轴的不同部分。
例如,假设训练集包含 100 个观察值,测试集包含 15 个观察值。测试集是模型用于预测的时间序列的后半部分(即使用训练集构建的模型)。
考虑使用 LSTM 预测波动的示例 weekly hotel cancellations。
训练和验证预测是使用 MinMaxScaler 生成的,以允许神经网络正确解释数据。据我所知,您没有在您的示例中执行此步骤。您应该这样做,否则您的结果很可能是错误的 - 您的数据不是通用的,因此 LSTM 模型无法正确解释它。
# Generate predictions
trainpred = model.predict(X_train)
valpred = model.predict(X_val)
In [30]:
trainpred
Out[30]:
array([[0.32363528],
[0.3715328 ],
[0.46051228],
[0.35137814],
[0.38220662],
[0.41239697],
[0.3573438 ],
[0.43657327],
[0.47494155],
[0.467317 ],
[0.49233937],
[0.49879026],
[0.39996487],
[0.38200712],
[0.3309482 ],
[0.21176702],
[0.22578238],
[0.18523258],
[0.23222469],
[0.26659006],
[0.2368085 ],
[0.22137557],
[0.28356454],
[0.16753006],
[0.16966385],
[0.22060908],
[0.1916717 ],
[0.2181809 ],
[0.21772115],
[0.24777801],
[0.3288507 ],
[0.30944437],
[0.33784014],
[0.37927932],
[0.31557906],
[0.43595707],
[0.3505273 ],
[0.4064384 ],
[0.48314226],
[0.41506904],
[0.48799258],
[0.4533432 ],
[0.45297146],
[0.46697432],
[0.41320056],
[0.45331544],
[0.48461175],
[0.50513804],
[0.50340337],
[0.44235045],
[0.48495632],
[0.32804203],
[0.38383847],
[0.3502031 ],
[0.34179717],
[0.37928385],
[0.3852548 ],
[0.3978842 ],
[0.41324353],
[0.42388642],
[0.43424374],
[0.4359951 ],
[0.49112016],
[0.49098223],
[0.50581044],
[0.5686604 ],
[0.48814237],
[0.5679423 ],
[0.519874 ],
[0.42899352],
[0.4314267 ],
[0.3878218 ],
[0.3585053 ],
[0.31897143]], dtype=float32)
In [31]:
valpred
Out[31]:
array([[0.374565 ],
[0.311441 ],
[0.37602562],
[0.36187553],
[0.35613692],
[0.399751 ],
[0.40736055],
[0.41798282],
[0.36257237],
[0.4636013 ],
[0.47177172],
[0.45880812],
[0.5725181 ],
[0.5696718 ]], dtype=float32)
预测值转换回正常值:
# Convert predictions back to normal values
trainpred = scaler.inverse_transform(trainpred)
Y_train = scaler.inverse_transform([Y_train])
valpred = scaler.inverse_transform(valpred)
Y_val = scaler.inverse_transform([Y_val])
predictions = valpred
然后绘制预测:
In [34]:
# Train predictions
trainpredPlot = np.empty_like(df)
trainpredPlot[:, :] = np.nan
trainpredPlot[previous:len(trainpred)+previous, :] = trainpred
In [35]:
# Validation predictions
valpredPlot = np.empty_like(df)
valpredPlot[:, :] = np.nan
valpredPlot[len(trainpred)+(previous*2)+1:len(df)-1, :] = valpred
In [36]:
# Plot all predictions
inversetransform, =plt.plot(scaler.inverse_transform(df))
trainpred, =plt.plot(trainpredPlot)
valpred, =plt.plot(valpredPlot)
plt.xlabel('Number of weeks')
plt.ylabel('Cancellations')
plt.title("Predicted vs. Actual Cancellations Per Week")
plt.show()
图表现在显示如下:
总结两点:
确保在绘制真实数据时 - 训练和测试预测不重叠。这是错误的,因为训练和测试预测指的是两组不同的预测。
在输入 LSTM 之前缩放您的数据 - 否则神经网络将不知道如何解释此类数据,任何结果都将非常肤浅。