用于视频输入的 LSTM
LSTM for Video Input
我是一个尝试 LSTM 的新手。
我基本上是在使用 LSTM 来确定动作类型(5 种不同的动作),例如 运行、跳舞等。我的输入是每个动作 60 帧,大致可以说大约 120 个这样的视频
train_x.shape = (120,192,192,60)
其中 120 是训练样本视频的数量,192X192 是帧大小,60 是 # 帧。
train_y.shape = (120*5) [1 0 0 0 0 ..... 0 0 0 0 1] 一个热编码
我不清楚如何将 3d 参数传递给 lstm(时间戳和特征)
model.add(LSTM(100, input_shape=(train_x.shape[1],train_x.shape[2])))
model.add(Dropout(0.5))
model.add(Dense(100, activation='relu'))
model.add(Dense(len(uniquesegments), activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(train_x, train_y, epochs=100, batch_size=batch_size, verbose=1)
我收到以下错误
顺序层的输入 0 与层不兼容:预期 ndim=3,发现 ndim=4。已收到完整形状:(None, 192, 192, 60)
训练数据算法
Loop through videos
Loop through each frame of a video
logic
append to array
convert to numpy array
roll axis to convert 60 192 192 to 192 192 60
add to training list
convert training list to numpy array
训练列表形状 <120, 192, 192, 60>
从文档来看,LSTM 似乎甚至不打算采用 input_shape
参数。这是有道理的,因为通常您应该在每个时间步为其提供一维特征。这就是为什么在文档中说:
inputs: A 3D tensor with shape [batch, timesteps, feature]
你试图做的事情是行不通的(我还给你留下了一条评论,解释了为什么你可能不应该那样做)。
首先你应该知道,解决视频分类任务的方法比 LSTM 或任何 更适合 Convolutional RNN ]RNN Cell,正如CNN比MLP
更适合图像分类任务
那些 RNN 单元(例如 LSTM、GRU)期望具有形状 (samples, timesteps, channels)
的输入,因为您处理的是形状 (samples, timesteps, width, height, channels)
的输入,所以您应该使用 tf.keras.layers.ConvLSTM2D 而不是
以下示例代码将向您展示如何构建可以处理视频分类任务的模型:
import tensorflow as tf
from tensorflow.keras import models, layers
timesteps = 60
width = 192
height = 192
channels = 1
action_num = 5
model = models.Sequential(
[
layers.Input(
shape=(timesteps, width, height, channels)
),
layers.ConvLSTM2D(
filters=64, kernel_size=(3, 3), padding="same", return_sequences=True, dropout=0.1, recurrent_dropout=0.1
),
layers.MaxPool3D(
pool_size=(1, 2, 2), strides=(1, 2, 2), padding="same"
),
layers.BatchNormalization(),
layers.ConvLSTM2D(
filters=32, kernel_size=(3, 3), padding="same", return_sequences=True, dropout=0.1, recurrent_dropout=0.1
),
layers.MaxPool3D(
pool_size=(1, 2, 2), strides=(1, 2, 2), padding="same"
),
layers.BatchNormalization(),
layers.ConvLSTM2D(
filters=16, kernel_size=(3, 3), padding="same", return_sequences=False, dropout=0.1, recurrent_dropout=0.1
),
layers.MaxPool2D(
pool_size=(2, 2), strides=(2, 2), padding="same"
),
layers.BatchNormalization(),
layers.Flatten(),
layers.Dense(256, activation='relu'),
layers.Dense(action_num, activation='softmax')
]
)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
输出:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv_lst_m2d (ConvLSTM2D) (None, 60, 192, 192, 64) 150016
_________________________________________________________________
max_pooling3d (MaxPooling3D) (None, 60, 96, 96, 64) 0
_________________________________________________________________
batch_normalization (BatchNo (None, 60, 96, 96, 64) 256
_________________________________________________________________
conv_lst_m2d_1 (ConvLSTM2D) (None, 60, 96, 96, 32) 110720
_________________________________________________________________
max_pooling3d_1 (MaxPooling3 (None, 60, 48, 48, 32) 0
_________________________________________________________________
batch_normalization_1 (Batch (None, 60, 48, 48, 32) 128
_________________________________________________________________
conv_lst_m2d_2 (ConvLSTM2D) (None, 48, 48, 16) 27712
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 24, 24, 16) 0
_________________________________________________________________
batch_normalization_2 (Batch (None, 24, 24, 16) 64
_________________________________________________________________
flatten (Flatten) (None, 9216) 0
_________________________________________________________________
dense (Dense) (None, 256) 2359552
_________________________________________________________________
dense_1 (Dense) (None, 5) 1285
=================================================================
Total params: 2,649,733
Trainable params: 2,649,509
Non-trainable params: 224
_________________________________________________________________
请注意,在输入上述模型之前,您应该将数据重新排序为 (samples, timesteps, width, height, channels)
形状(即不像 np.reshape
,而是像 np.moveaxis
),在您的情况下,形状应该是(120, 60, 192, 192, 1)
,然后您可以将 120
视频拆分成批次并提供给模型
我是一个尝试 LSTM 的新手。
我基本上是在使用 LSTM 来确定动作类型(5 种不同的动作),例如 运行、跳舞等。我的输入是每个动作 60 帧,大致可以说大约 120 个这样的视频
train_x.shape = (120,192,192,60)
其中 120 是训练样本视频的数量,192X192 是帧大小,60 是 # 帧。
train_y.shape = (120*5) [1 0 0 0 0 ..... 0 0 0 0 1] 一个热编码
我不清楚如何将 3d 参数传递给 lstm(时间戳和特征)
model.add(LSTM(100, input_shape=(train_x.shape[1],train_x.shape[2])))
model.add(Dropout(0.5))
model.add(Dense(100, activation='relu'))
model.add(Dense(len(uniquesegments), activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(train_x, train_y, epochs=100, batch_size=batch_size, verbose=1)
我收到以下错误
顺序层的输入 0 与层不兼容:预期 ndim=3,发现 ndim=4。已收到完整形状:(None, 192, 192, 60)
训练数据算法
Loop through videos
Loop through each frame of a video
logic
append to array
convert to numpy array
roll axis to convert 60 192 192 to 192 192 60
add to training list
convert training list to numpy array
训练列表形状 <120, 192, 192, 60>
从文档来看,LSTM 似乎甚至不打算采用 input_shape
参数。这是有道理的,因为通常您应该在每个时间步为其提供一维特征。这就是为什么在文档中说:
inputs: A 3D tensor with shape [batch, timesteps, feature]
你试图做的事情是行不通的(我还给你留下了一条评论,解释了为什么你可能不应该那样做)。
首先你应该知道,解决视频分类任务的方法比 LSTM 或任何 更适合 Convolutional RNN ]RNN Cell,正如CNN比MLP
更适合图像分类任务那些 RNN 单元(例如 LSTM、GRU)期望具有形状 (samples, timesteps, channels)
的输入,因为您处理的是形状 (samples, timesteps, width, height, channels)
的输入,所以您应该使用 tf.keras.layers.ConvLSTM2D 而不是
以下示例代码将向您展示如何构建可以处理视频分类任务的模型:
import tensorflow as tf
from tensorflow.keras import models, layers
timesteps = 60
width = 192
height = 192
channels = 1
action_num = 5
model = models.Sequential(
[
layers.Input(
shape=(timesteps, width, height, channels)
),
layers.ConvLSTM2D(
filters=64, kernel_size=(3, 3), padding="same", return_sequences=True, dropout=0.1, recurrent_dropout=0.1
),
layers.MaxPool3D(
pool_size=(1, 2, 2), strides=(1, 2, 2), padding="same"
),
layers.BatchNormalization(),
layers.ConvLSTM2D(
filters=32, kernel_size=(3, 3), padding="same", return_sequences=True, dropout=0.1, recurrent_dropout=0.1
),
layers.MaxPool3D(
pool_size=(1, 2, 2), strides=(1, 2, 2), padding="same"
),
layers.BatchNormalization(),
layers.ConvLSTM2D(
filters=16, kernel_size=(3, 3), padding="same", return_sequences=False, dropout=0.1, recurrent_dropout=0.1
),
layers.MaxPool2D(
pool_size=(2, 2), strides=(2, 2), padding="same"
),
layers.BatchNormalization(),
layers.Flatten(),
layers.Dense(256, activation='relu'),
layers.Dense(action_num, activation='softmax')
]
)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
输出:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv_lst_m2d (ConvLSTM2D) (None, 60, 192, 192, 64) 150016
_________________________________________________________________
max_pooling3d (MaxPooling3D) (None, 60, 96, 96, 64) 0
_________________________________________________________________
batch_normalization (BatchNo (None, 60, 96, 96, 64) 256
_________________________________________________________________
conv_lst_m2d_1 (ConvLSTM2D) (None, 60, 96, 96, 32) 110720
_________________________________________________________________
max_pooling3d_1 (MaxPooling3 (None, 60, 48, 48, 32) 0
_________________________________________________________________
batch_normalization_1 (Batch (None, 60, 48, 48, 32) 128
_________________________________________________________________
conv_lst_m2d_2 (ConvLSTM2D) (None, 48, 48, 16) 27712
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 24, 24, 16) 0
_________________________________________________________________
batch_normalization_2 (Batch (None, 24, 24, 16) 64
_________________________________________________________________
flatten (Flatten) (None, 9216) 0
_________________________________________________________________
dense (Dense) (None, 256) 2359552
_________________________________________________________________
dense_1 (Dense) (None, 5) 1285
=================================================================
Total params: 2,649,733
Trainable params: 2,649,509
Non-trainable params: 224
_________________________________________________________________
请注意,在输入上述模型之前,您应该将数据重新排序为 (samples, timesteps, width, height, channels)
形状(即不像 np.reshape
,而是像 np.moveaxis
),在您的情况下,形状应该是(120, 60, 192, 192, 1)
,然后您可以将 120
视频拆分成批次并提供给模型