如何从 pandas 数据帧在 tensorflow v1 中实现 LSTM
How to implement LSTM in tensorflow v1 from pandas dataframe
我已经尝试按照有关实现此功能的教程进行操作,但我一直在 LSTM 层上遇到维度错误。
ValueError: Input 0 of layer LSTM is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [None, 2]
import random
import numpy as np
import tensorflow as tf
from tensorflow import feature_column as fc
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, DenseFeatures, Reshape
from sklearn.model_selection import train_test_split
def df_to_dataset(features, target, batch_size=32):
return tf.data.Dataset.from_tensor_slices((dict(features), target)).batch(batch_size)
# Reset randomization seeds
np.random.seed(0)
tf.random.set_random_seed(0)
random.seed(0)
# Assume 'frame' to be a dataframe with 3 columns: 'optimal_long_log_return', 'optimal_short_log_return' (independent variables) and 'equilibrium_log_return' (dependent variable)
X = frame[['optimal_long_log_return', 'optimal_short_log_return']][:-1]
Y = frame['equilibrium_log_return'].shift(-1)[:-1]
X_train, _X, y_train, _y = train_test_split(X, Y, test_size=0.5, shuffle=False, random_state=1)
X_validation, X_test, y_validation, y_test = train_test_split(_X, _y, test_size=0.5, shuffle=False, random_state=1)
train = df_to_dataset(X_train, y_train)
validation = df_to_dataset(X_validation, y_validation)
test = df_to_dataset(X_test, y_test)
feature_columns = [fc.numeric_column('optimal_long_log_return'), fc.numeric_column('optimal_short_log_return')]
model = Sequential()
model.add(DenseFeatures(feature_columns, name='Metadata'))
model.add(LSTM(256, name='LSTM'))
model.add(Dense(1, name='Output'))
model.compile(loss='logcosh', metrics=['mean_absolute_percentage_error'], optimizer='Adam')
model.fit(train, epochs=10, validation_data=validation, verbose=1)
loss, accuracy = model.evaluate(test, verbose=0)
print(f'Target Error: {accuracy}%')
在其他地方看到这个问题后,我尝试设置 input_shape=(None, *X_train.shape)
、input_shape=X_train.shape
,但均无效。我还尝试在 LSTM 层之前插入一个 Reshape 层 model.add(Reshape(X_train.shape))
,它解决了这个问题,但我在它的位置遇到了另一个问题:
InvalidArgumentError: Input to reshape is a tensor with 64 values, but the requested shape has 8000
...而且我什至不确定添加 Reshape 层是否正在执行我认为正在执行的操作。毕竟,为什么将数据重塑为它自己的形状会解决问题?我的数据发生了一些我不明白的事情。
此外,我将其用于时间序列分析(股票 returns),因此我认为 LSTM 模型应该是有状态的和时态的。在转换为张量之前,我是否需要将时间戳索引移动到 pandas 数据库中它自己的列中?
不幸的是,我不得不使用 tensorflow v1.15,因为这是在 QuantConnect 平台上开发的,他们可能不会很快更新库。
编辑:我通过使用 TimeseriesGenerator 取得了一些进展,但现在我收到以下错误(returns 在 Google 上没有结果):
KeyError: 'No key found for either mapped or original key. Mapped Key: []; Original Key: []'
下面的代码(我确定我使用的 input_shape 参数不正确):
train = TimeseriesGenerator(X_train, y_train, 1, batch_size=batch_size)
validation = TimeseriesGenerator(X_validation, y_validation, 1, batch_size=batch_size)
test = TimeseriesGenerator(X_test, y_test, 1, batch_size=batch_size)
model = Sequential(name='Expected Equilibrium Log Return')
model.add(LSTM(256, name='LSTM', stateful=True, batch_input_shape=(1, batch_size, X_train.shape[1]), input_shape=(1, X_train.shape[1])))
model.add(Dense(1, name='Output'))
model.compile(loss='logcosh', metrics=['mean_absolute_percentage_error'], optimizer='Adam', sample_weight_mode='temporal')
print(model.summary())
model.fit_generator(train, epochs=10, validation_data=validation, verbose=1)
loss, accuracy = model.evaluate_generator(test, verbose=0)
print(f'Model Accuracy: {accuracy}')
原来这个特定问题与 Quantconnect 对 pandas 数据帧的补丁有关,该补丁干扰了旧版本的 tensorflow/keras。
我已经尝试按照有关实现此功能的教程进行操作,但我一直在 LSTM 层上遇到维度错误。
ValueError: Input 0 of layer LSTM is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [None, 2]
import random
import numpy as np
import tensorflow as tf
from tensorflow import feature_column as fc
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, DenseFeatures, Reshape
from sklearn.model_selection import train_test_split
def df_to_dataset(features, target, batch_size=32):
return tf.data.Dataset.from_tensor_slices((dict(features), target)).batch(batch_size)
# Reset randomization seeds
np.random.seed(0)
tf.random.set_random_seed(0)
random.seed(0)
# Assume 'frame' to be a dataframe with 3 columns: 'optimal_long_log_return', 'optimal_short_log_return' (independent variables) and 'equilibrium_log_return' (dependent variable)
X = frame[['optimal_long_log_return', 'optimal_short_log_return']][:-1]
Y = frame['equilibrium_log_return'].shift(-1)[:-1]
X_train, _X, y_train, _y = train_test_split(X, Y, test_size=0.5, shuffle=False, random_state=1)
X_validation, X_test, y_validation, y_test = train_test_split(_X, _y, test_size=0.5, shuffle=False, random_state=1)
train = df_to_dataset(X_train, y_train)
validation = df_to_dataset(X_validation, y_validation)
test = df_to_dataset(X_test, y_test)
feature_columns = [fc.numeric_column('optimal_long_log_return'), fc.numeric_column('optimal_short_log_return')]
model = Sequential()
model.add(DenseFeatures(feature_columns, name='Metadata'))
model.add(LSTM(256, name='LSTM'))
model.add(Dense(1, name='Output'))
model.compile(loss='logcosh', metrics=['mean_absolute_percentage_error'], optimizer='Adam')
model.fit(train, epochs=10, validation_data=validation, verbose=1)
loss, accuracy = model.evaluate(test, verbose=0)
print(f'Target Error: {accuracy}%')
在其他地方看到这个问题后,我尝试设置 input_shape=(None, *X_train.shape)
、input_shape=X_train.shape
,但均无效。我还尝试在 LSTM 层之前插入一个 Reshape 层 model.add(Reshape(X_train.shape))
,它解决了这个问题,但我在它的位置遇到了另一个问题:
InvalidArgumentError: Input to reshape is a tensor with 64 values, but the requested shape has 8000
...而且我什至不确定添加 Reshape 层是否正在执行我认为正在执行的操作。毕竟,为什么将数据重塑为它自己的形状会解决问题?我的数据发生了一些我不明白的事情。
此外,我将其用于时间序列分析(股票 returns),因此我认为 LSTM 模型应该是有状态的和时态的。在转换为张量之前,我是否需要将时间戳索引移动到 pandas 数据库中它自己的列中?
不幸的是,我不得不使用 tensorflow v1.15,因为这是在 QuantConnect 平台上开发的,他们可能不会很快更新库。
编辑:我通过使用 TimeseriesGenerator 取得了一些进展,但现在我收到以下错误(returns 在 Google 上没有结果):
KeyError: 'No key found for either mapped or original key. Mapped Key: []; Original Key: []'
下面的代码(我确定我使用的 input_shape 参数不正确):
train = TimeseriesGenerator(X_train, y_train, 1, batch_size=batch_size)
validation = TimeseriesGenerator(X_validation, y_validation, 1, batch_size=batch_size)
test = TimeseriesGenerator(X_test, y_test, 1, batch_size=batch_size)
model = Sequential(name='Expected Equilibrium Log Return')
model.add(LSTM(256, name='LSTM', stateful=True, batch_input_shape=(1, batch_size, X_train.shape[1]), input_shape=(1, X_train.shape[1])))
model.add(Dense(1, name='Output'))
model.compile(loss='logcosh', metrics=['mean_absolute_percentage_error'], optimizer='Adam', sample_weight_mode='temporal')
print(model.summary())
model.fit_generator(train, epochs=10, validation_data=validation, verbose=1)
loss, accuracy = model.evaluate_generator(test, verbose=0)
print(f'Model Accuracy: {accuracy}')
原来这个特定问题与 Quantconnect 对 pandas 数据帧的补丁有关,该补丁干扰了旧版本的 tensorflow/keras。