LSTM occurs ValueError: Shapes (5, 2, 3) and (5, 3) are incompatible

LSTM occurs ValueError: Shapes (5, 2, 3) and (5, 3) are incompatible

我想用时间序列数据做时间序列多class class化。这里我得到的数据集需要大量预处理,只是为了了解如何实现我使用 IRIS 数据集(不适用于 LSTM)的模型,因为它具有与时间序列数据完全相同的结构我有(4 个输入特征,1 个输出特征,120 个样本)。我实现了以下代码,但是在使用批量大小 5 拟合模型时它会导致我出现无效形状错误(多次更改批量大小但似乎没有进行任何更改)

#load dataset
    dataframe = pandas.read_csv("iris.csv",header=None)
    dataset = dataframe.values
    X=dataset[:,0:4].astype(float)
    Y=dataset[:,4]
# Encode the output variables
    encoder = LabelEncoder()
    encoder.fit(Y)
    # convert output variables into the numbers
    encoded_Y = encoder.transform(Y)
    # Convert integers to dummy variables (one-hot encoded)
    dummy_Y = np_utils.to_categorical(encoded_Y)
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,dummy_Y,test_size=0.2) #20% is allocated for the testing
X_train = X_train.reshape(60, 2, 4)
y_train = y_train.reshape(60, 2, 3)
y_train.shape,X_train.shape

((60, 2, 3), (60, 2, 4))


 # Create the Neural Network Model
def create_nn_model():
#create sequential model
  model = Sequential()
  model.add(LSTM(100,dropout=0.2, input_shape=(X_train.shape[1],X_train.shape[2])))
  model.add(Dense(100, activation='relu'))
  model.add(Dense(3,activation='softmax'))
  # Compile model
  model.compile(loss='categorical_crossentropy',optimizer='adam', metrics=['accuracy'])
  return model
model = create_nn_model()
model.summary()

> Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_1 (LSTM)                (None, 100)               42000     
_________________________________________________________________
dense_2 (Dense)              (None, 100)               10100     
_________________________________________________________________
dense_3 (Dense)              (None, 3)                 303       
=================================================================
Total params: 52,403
Trainable params: 52,403
Non-trainable params: 0
model.fit(X_train,y_train,epochs=200,batch_size=5)

> ValueError                                Traceback (most recent call last)

<ipython-input-26-0aef33c299f0> in <module>()
----> 1 model.fit(X_train,y_train,epochs=200,batch_size=5) #X_train is independant variables. based on the amount of the data set data set will be trained by breaking into batches

9 frames

/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py in wrapper(*args, **kwargs)
    984           except Exception as e:  # pylint:disable=broad-except
    985             if hasattr(e, "ag_error_metadata"):
--> 986               raise e.ag_error_metadata.to_exception(e)
    987             else:
    988               raise

ValueError: in user code:

    /usr/local/lib/python3.7/dist-packages/keras/engine/training.py:830 train_function  *
        return step_function(self, iterator)
    /usr/local/lib/python3.7/dist-packages/keras/engine/training.py:813 run_step  *
        outputs = model.train_step(data)
    /usr/local/lib/python3.7/dist-packages/keras/engine/training.py:771 train_step  *
        loss = self.compiled_loss(
    /usr/local/lib/python3.7/dist-packages/keras/engine/compile_utils.py:201 __call__  *
        loss_value = loss_obj(y_t, y_p, sample_weight=sw)
    /usr/local/lib/python3.7/dist-packages/keras/losses.py:142 __call__  *
        losses = call_fn(y_true, y_pred)
    /usr/local/lib/python3.7/dist-packages/keras/losses.py:246 call  *
        return ag_fn(y_true, y_pred, **self._fn_kwargs)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py:206 wrapper  **
        return target(*args, **kwargs)
    /usr/local/lib/python3.7/dist-packages/keras/losses.py:1631 categorical_crossentropy
        y_true, y_pred, from_logits=from_logits)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py:206 wrapper
        return target(*args, **kwargs)
    /usr/local/lib/python3.7/dist-packages/keras/backend.py:4827 categorical_crossentropy
        target.shape.assert_is_compatible_with(output.shape)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/tensor_shape.py:1161 assert_is_compatible_with
        raise ValueError("Shapes %s and %s are incompatible" % (self, other))

    ValueError: Shapes (5, 2, 3) and (5, 3) are incompatible

你的y_truey_pred不在同一个形状。您可能需要按以下方式定义您的 LSTM

model.add(LSTM(100,dropout=0.2, input_shape=(2,4), return_sequences=True))
....
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
....
dense_3 (Dense)              (None, 2, 3)              303        < ---
=================================================================

更新

使用 return_sequences = True 会起作用,因为您以这种方式定义了您的 Training-Paris:

X_train = X_train.reshape(60, 2, 4)
y_train = y_train.reshape(60, 2, 3)

代表(batch_size, timestep, input_lenght);但请注意,您需要重塑或满足上述模型中 LSTM 层的输入要求,而不是 y_train。但是,当您定义模型时,您不使用 return 序列,它使最后一层只有三个没有时间步长的分类器,但您的 y_train 是以这种方式定义的。但是,如果将 return 序列设置为 True 并绘制模型摘要,您会看到最后一层的输出形状为 (None, 2, 3),与 y_train 的形状完全匹配.

在了解 return_sequence 在这里做什么之前,您可能需要了解时间步长在 LSTM 模型中的含义,请查看 answer. AFAIK, it depends on how many timesteps you need to set for your input; I can make a single occurrence of the LSTM cell or multiple times (n-th timestep). And for n-th timestep (n: {1,2,3..N), if I want from LSTM to return all timestep output (n numbers), then I will set return_sequence = True, but else return_sequence = False. From doc

return_sequences: Boolean. Whether to return the last output. in the output sequence, or the full sequence. Default: False.

简而言之,如果设置为 True,则所有序列都会 return,但如果设置为 False,则只有最后一个输出会。例如:

inputs = tf.random.normal([32, 8])
inputs = tf.reshape(inputs, [-1, 2, 4 ]) # or [-1, 4, 2] # or [-1, 1, 8]
inputs.shape 
TensorShape([32, 2, 4]) # (batch_size, timestep, input_length)

lstm = tf.keras.layers.LSTM(10, return_sequences=True)
whole_seq_output = lstm(inputs)
print(whole_seq_output.shape)
(32, 2, 10) # (batch_size, timestep, output_length)

lstm = tf.keras.layers.LSTM(10, return_sequences=False)
last_seq_output = lstm(inputs)
print(last_seq_output.shape)
(32, 10) # (batch_size, output_length)

这是对上述代码的一种单向方法。虹膜数据取自 here.

import pandas 
dataframe = pandas.read_csv("/content/iris.csv")
dataframe.head(3)

  sepal.length  sepal.width petal.length    petal.width   variety
0   5.1              3.5         1.4             0.2      Setosa
1   4.9              3.0         1.4             0.2      Setosa
2   4.7              3.2         1.3             0.2      Setosa
dataframe.variety.unique()
array(['Setosa', 'Versicolor', 'Virginica'], dtype=object)
target_map = dict(zip(list(dataframe['variety'].unique()), 
                     ([0, 1, 2])))
target_map
{'Setosa': 0, 'Versicolor': 1, 'Virginica': 2}
dataframe['target'] = dataframe.variety.map(target_map) 
dataframe.sample()
    sepal.length    sepal.width petal.length  petal.width   variety   target
128      6.4             2.8       5.6           2.1       Virginica    2
X = dataframe.iloc[:, :4] 
Y = dataframe.iloc[:, 5]

X.shape, Y.shape
((150, 4), (150,))
from tensorflow.keras.utils import to_categorical
from sklearn.model_selection import train_test_split

OHE_Y = to_categorical(Y, num_classes=3)
X_train, X_test, y_train, y_test = train_test_split(X, OHE_Y, 
                                                      test_size=0.2)

X_train.shape
(120, 4)

# make it lstm compatible input 
X_train = X_train.values.reshape(-1, 1, 4)

X_train.shape ,y_train.shape
((120, 1, 4), (120, 3))

型号

from tensorflow.keras import Sequential 
from tensorflow.keras.layers import LSTM, Dense 

def create_nn_model():
  model = Sequential()
  model.add(LSTM(100, dropout=0.2, input_shape=(X_train.shape[1],
                                               X_train.shape[2])))
  model.add(Dense(100, activation='relu'))
  model.add(Dense(3,activation='softmax'))
  model.compile(loss='categorical_crossentropy',
                optimizer='adam', metrics=['accuracy'])
  return model

model = create_nn_model()
model.summary()

model.fit(X_train, y_train, epochs=10,batch_size=5)

...
Epoch 9/10
3ms/step - loss: 0.5224 - accuracy: 0.7243
Epoch 10/10
3ms/step - loss: 0.5568 - accuracy: 0.7833

推理

model.evaluate(X_train, y_train)
4ms/step - loss: 0.3843 - accuracy: 0.9583
[0.38432881236076355, 0.9583333134651184]

y_pred = model.predict(X_train).argmax(-1)
y_pred
array([2, 1, 1, 1, 1, 2, 2, 0, 1, 2, 2, 2, 0, 1, 1, 1, 0, 1, 0, 0, 2, 0,
       0, 2, 2, 0, 0, 2, 0, 0, 1, 0, 0, 1, 0, 2, 2, 0, 2, 2, 0, 2, 0, 0,
       1, 1, 2, 0, 1, 2, 1, 2, 0, 0, 2, 2, 2, 0, 0, 0, 2, 2, 2, 0, 0, 0,
       2, 2, 0, 2, 1, 0, 2, 1, 0, 0, 0, 1, 1, 1, 0, 2, 2, 1, 1, 0, 2, 0,
       0, 2, 1, 0, 2, 1, 1, 1, 1, 2, 1, 0, 1, 2, 1, 1, 2, 1, 1, 1, 2, 2,
       0, 1, 2, 1, 0, 0, 2, 1, 2, 0])