时间序列数据中 LSTM 训练测试拆分的问题
Problem in LSTM train-test split in time series data
我正在尝试使用我的 csv 文件制作训练集和测试集来训练 LSTM。 csv 文件如下所示:
datetime invno inkw outkw Total wind_spd temp pres ts
2021-12-01 00:00:00 1 0.0 0.0 0.0 4.6 -0.9 1007.7 1638284400.0
2021-12-01 00:00:00 4 0.0 0.0 0.0, 4.6 -0.9 1007.7 1638284400.0
2021-12-01 00:00:00 2 0.0 0.0 0.0, 4.6 -0.9 1007.7 1638284400.0
2021-12-01 00:00:00 3 0.0 0.0 0.0, 4.6 -0.9 1007.7 1638284400.0
2021-12-01 00:00:00 5 0.0 0.0 0.0, 4.6 -0.9 1007.7 1638284400.0
2021-12-01 01:00:00 1 0.0 0.0 0.0, 9.8 -1.3 1007.7 1638288000.0
2021-12-01 01:00:00 4 0.0 0.0 0.0, 9.8 -1.3 1007.7 1638288000.0
.......... ........ . ... .... ... ... .... ... ......
.......... ........ . ... .... ... ... .... ... ......
2021-12-10 17:00:00 2 0.06735057830810548 0.087 23.9 2.3 -1.2 1007.6 163828800.0
2021-12-10 17:00:00 3 0.03403729248046875 0.091 24.1 2.3 -1.2 1007.6 163828800.0
2021-12-10 17:00:00 4 0.08401119232177734 0.09 24.3 2.3 -1.2 1007.6 163828800.0
2021-12-10 17:00:00 5 0.08356260681152344 0.087 24.6 2.3 -1.2 1007.6 163828800.0
制作训练集和测试集后的数据集形状:
(1170, 9)
Training shape: (930, 30, 8)
Testing shape: (185, 30, 8)
这是我的代码:
import os
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
#from sklearn.externals import joblib
import joblib
import seaborn as sns
sns.set(color_codes=True)
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from numpy.random import seed
#from tensorflow import set_random_seed
import tensorflow
tensorflow.random.set_seed
import tensorflow as tf
#tf.logging.set_verbosity(tf.logging.ERROR)
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input, Dropout, Dense, LSTM, TimeDistributed, RepeatVector
from tensorflow.keras.models import Model
from tensorflow.keras import regularizers
import plotly.graph_objects as go
dataset = pd.read_csv('./data/combined.csv')
print(dataset.shape)
dataset.fillna(0, inplace=True)
dataset = dataset.set_index('datetime')
train = dataset[:'2021-12-08 23:00:00']
test = dataset['2021-12-08 23:00:00':]
scaler = StandardScaler()
scaler = scaler.fit(train)
train = scaler.transform(train)
test = scaler.transform(test)
TIME_STEPS=30
def create_sequences(X, y, time_steps=TIME_STEPS):
Xs, ys = [], []
for i in range(len(X)-time_steps):
Xs.append(X.iloc[i:(i+time_steps)].values)
ys.append(y.iloc[i+time_steps])
return np.array(Xs), np.array(ys)
X_train, y_train = create_sequences(train, train)
X_test, y_test = create_sequences(test, test)
print(f'Training shape: {X_train.shape}')
print(f'Testing shape: {X_test.shape}')
model = Sequential()
model.add(LSTM(128, input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(Dropout(rate=0.2))
model.add(RepeatVector(X_train.shape[1]))
model.add(LSTM(128, return_sequences=True))
model.add(Dropout(rate=0.2))
model.add(TimeDistributed(Dense(X_train.shape[2])))
model.compile(optimizer='adam', loss='mae')
model.summary()
history = model.fit(X_train, y_train, epochs=100, batch_size=16, validation_split=0.1 , shuffle=False)
每当我 运行 此代码时,我都会收到以下错误:
Traceback (most recent call last):
File "/Users/sudip/Desktop/workspace/local_work/LSTM_api/test-1.py", line 58, in <module>
X_train, y_train = create_sequences(train, train)
File "/Users/sudip/Desktop/workspace/local_work/LSTM_api/test-1.py", line 53, in create_sequences
Xs.append(X.iloc[i:(i+time_steps)].values)
AttributeError: 'numpy.ndarray' object has no attribute 'iloc'
删除 iloc
和 values
后出现以下错误:
Epoch 1/100
Traceback (most recent call last):
File "/Users/sudip/Desktop/workspace/local_work/LSTM_api/test-1.py", line 77, in <module>
history = model.fit(X_train, y_train, epochs=100, batch_size=16, validation_split=0.1 , shuffle=False)
File "/Users/sudip/Desktop/workspace/env/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/Users/sudip/Desktop/workspace/env/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 58, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [16,30,8] vs. [16,8]
[[node gradient_tape/mean_absolute_error/sub/BroadcastGradientArgs
(defined at /Users/sudip/Desktop/workspace/env/lib/python3.9/site-packages/keras/optimizer_v2/optimizer_v2.py:464)
]] [Op:__inference_train_function_5593]
Errors may have originated from an input operation.
Input Source operations connected to node gradient_tape/mean_absolute_error/sub/BroadcastGradientArgs:
我认为错误来自输入形状。我可以得到一些帮助来解决这个问题吗?
如何根据日期和时间从时间序列数据中拆分训练和测试?
数据形状有问题。您的网络的输入形状和输出形状相同,但 X_train 和 y_train 的形状不同。
一个可以完成这项工作的简单模型:
model = Sequential()
model.add(LSTM(128, input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(Dense(y_train.shape[1]))
model.compile(optimizer='adam', loss='mae')
model.summary()
我正在尝试使用我的 csv 文件制作训练集和测试集来训练 LSTM。 csv 文件如下所示:
datetime invno inkw outkw Total wind_spd temp pres ts
2021-12-01 00:00:00 1 0.0 0.0 0.0 4.6 -0.9 1007.7 1638284400.0
2021-12-01 00:00:00 4 0.0 0.0 0.0, 4.6 -0.9 1007.7 1638284400.0
2021-12-01 00:00:00 2 0.0 0.0 0.0, 4.6 -0.9 1007.7 1638284400.0
2021-12-01 00:00:00 3 0.0 0.0 0.0, 4.6 -0.9 1007.7 1638284400.0
2021-12-01 00:00:00 5 0.0 0.0 0.0, 4.6 -0.9 1007.7 1638284400.0
2021-12-01 01:00:00 1 0.0 0.0 0.0, 9.8 -1.3 1007.7 1638288000.0
2021-12-01 01:00:00 4 0.0 0.0 0.0, 9.8 -1.3 1007.7 1638288000.0
.......... ........ . ... .... ... ... .... ... ......
.......... ........ . ... .... ... ... .... ... ......
2021-12-10 17:00:00 2 0.06735057830810548 0.087 23.9 2.3 -1.2 1007.6 163828800.0
2021-12-10 17:00:00 3 0.03403729248046875 0.091 24.1 2.3 -1.2 1007.6 163828800.0
2021-12-10 17:00:00 4 0.08401119232177734 0.09 24.3 2.3 -1.2 1007.6 163828800.0
2021-12-10 17:00:00 5 0.08356260681152344 0.087 24.6 2.3 -1.2 1007.6 163828800.0
制作训练集和测试集后的数据集形状:
(1170, 9)
Training shape: (930, 30, 8)
Testing shape: (185, 30, 8)
这是我的代码:
import os
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
#from sklearn.externals import joblib
import joblib
import seaborn as sns
sns.set(color_codes=True)
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from numpy.random import seed
#from tensorflow import set_random_seed
import tensorflow
tensorflow.random.set_seed
import tensorflow as tf
#tf.logging.set_verbosity(tf.logging.ERROR)
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input, Dropout, Dense, LSTM, TimeDistributed, RepeatVector
from tensorflow.keras.models import Model
from tensorflow.keras import regularizers
import plotly.graph_objects as go
dataset = pd.read_csv('./data/combined.csv')
print(dataset.shape)
dataset.fillna(0, inplace=True)
dataset = dataset.set_index('datetime')
train = dataset[:'2021-12-08 23:00:00']
test = dataset['2021-12-08 23:00:00':]
scaler = StandardScaler()
scaler = scaler.fit(train)
train = scaler.transform(train)
test = scaler.transform(test)
TIME_STEPS=30
def create_sequences(X, y, time_steps=TIME_STEPS):
Xs, ys = [], []
for i in range(len(X)-time_steps):
Xs.append(X.iloc[i:(i+time_steps)].values)
ys.append(y.iloc[i+time_steps])
return np.array(Xs), np.array(ys)
X_train, y_train = create_sequences(train, train)
X_test, y_test = create_sequences(test, test)
print(f'Training shape: {X_train.shape}')
print(f'Testing shape: {X_test.shape}')
model = Sequential()
model.add(LSTM(128, input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(Dropout(rate=0.2))
model.add(RepeatVector(X_train.shape[1]))
model.add(LSTM(128, return_sequences=True))
model.add(Dropout(rate=0.2))
model.add(TimeDistributed(Dense(X_train.shape[2])))
model.compile(optimizer='adam', loss='mae')
model.summary()
history = model.fit(X_train, y_train, epochs=100, batch_size=16, validation_split=0.1 , shuffle=False)
每当我 运行 此代码时,我都会收到以下错误:
Traceback (most recent call last):
File "/Users/sudip/Desktop/workspace/local_work/LSTM_api/test-1.py", line 58, in <module>
X_train, y_train = create_sequences(train, train)
File "/Users/sudip/Desktop/workspace/local_work/LSTM_api/test-1.py", line 53, in create_sequences
Xs.append(X.iloc[i:(i+time_steps)].values)
AttributeError: 'numpy.ndarray' object has no attribute 'iloc'
删除 iloc
和 values
后出现以下错误:
Epoch 1/100
Traceback (most recent call last):
File "/Users/sudip/Desktop/workspace/local_work/LSTM_api/test-1.py", line 77, in <module>
history = model.fit(X_train, y_train, epochs=100, batch_size=16, validation_split=0.1 , shuffle=False)
File "/Users/sudip/Desktop/workspace/env/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/Users/sudip/Desktop/workspace/env/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 58, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [16,30,8] vs. [16,8]
[[node gradient_tape/mean_absolute_error/sub/BroadcastGradientArgs
(defined at /Users/sudip/Desktop/workspace/env/lib/python3.9/site-packages/keras/optimizer_v2/optimizer_v2.py:464)
]] [Op:__inference_train_function_5593]
Errors may have originated from an input operation.
Input Source operations connected to node gradient_tape/mean_absolute_error/sub/BroadcastGradientArgs:
我认为错误来自输入形状。我可以得到一些帮助来解决这个问题吗?
如何根据日期和时间从时间序列数据中拆分训练和测试?
数据形状有问题。您的网络的输入形状和输出形状相同,但 X_train 和 y_train 的形状不同。
一个可以完成这项工作的简单模型:
model = Sequential()
model.add(LSTM(128, input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(Dense(y_train.shape[1]))
model.compile(optimizer='adam', loss='mae')
model.summary()