在关键字检测器中使用 LSTM 进行 TimeDistributed
TimeDistributed with LSTM in keyword spotter
我正在开发一个关键字检测器,它根据类似于此处显示的语音命令列表处理音频输入和 returns 音频的 class:https://www.tensorflow.org/tutorials/audio/simple_audio
我希望能够处理多帧音频,而不是只处理 1 秒的音频作为输入,比如 5 个时间步长 10 毫秒,然后将它们输入机器学习模型。
本质上,这相当于在我的网络之上添加了一个 TimeDistributed
层。
我想做的第二件事是在密集层之前添加一个 LSTM 层,将我的隐藏层映射到输出 classes.
我的问题:我怎样才能有效地更改下面的代码以添加一个采用多个时间步长的 TimeDistributed
层和一个 LSTM 层。
开始代码:
model = models.Sequential([
layers.Input(shape=input_shape),
preprocessing.Resizing(32, 32),
norm_layer,
layers.Conv2D(32, 3, activation='relu'),
layers.Conv2D(64, 3, activation='relu'),
layers.MaxPooling2D(),
layers.Dropout(0.25),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dropout(0.5),
layers.Dense(num_labels),
])
模型摘要:
Input shape: (124, 129, 1)
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
resizing (Resizing) (None, 32, 32, 1) 0
_________________________________________________________________
normalization (Normalization (None, 32, 32, 1) 3
_________________________________________________________________
conv2d (Conv2D) (None, 30, 30, 32) 320
_________________________________________________________________
conv2d_1 (Conv2D) (None, 28, 28, 64) 18496
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 14, 14, 64) 0
_________________________________________________________________
dropout (Dropout) (None, 14, 14, 64) 0
_________________________________________________________________
flatten (Flatten) (None, 12544) 0
_________________________________________________________________
dense (Dense) (None, 128) 1605760
_________________________________________________________________
dropout_1 (Dropout) (None, 128) 0
_________________________________________________________________
dense_1 (Dense) (None, 8) 1032
=================================================================
Total params: 1,625,611
Trainable params: 1,625,608
Non-trainable params: 3
_________________________________________________________________
尝试 1:添加 LSTM 层
model = models.Sequential([
layers.Input(shape=input_shape),
preprocessing.Resizing(32, 32),
norm_layer,
layers.Conv2D(32, 3, activation='relu'),
layers.Conv2D(64, 3, activation='relu'),
layers.MaxPooling2D(),
layers.Dropout(0.25),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dropout(0.5),
layers.Flatten(),
layers.LSTM(32, activation='relu', input_shape=(1,128,98)),
layers.Dense(num_labels),
])
错误:ValueError: Input 0 of layer lstm_5 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [None, 128]
尝试 2:添加 TimeDistributed 层:
model = models.Sequential([
layers.Input(shape=input_shape),
preprocessing.Resizing(32, 32),
norm_layer,
TimeDistributed(layers.Conv2D(32, 3, activation='relu'), input_shape=(None, 32, 32, 1)),
TimeDistributed(layers.Conv2D(64, 3, activation='relu'), input_shape=(None, 30, 30, 1)),
TimeDistributed(layers.MaxPooling2D()),
TimeDistributed(layers.Dropout(0.25)),
TimeDistributed(layers.Flatten()),
TimeDistributed(layers.Dense(128, activation='relu')),
TimeDistributed(layers.Dropout(0.5)),
TimeDistributed(layers.Flatten()),
layers.Dense(num_labels),
])
错误:ValueError: Input 0 of layer conv2d_43 is incompatible with the layer: : expected min_ndim=4, found ndim=3. Full shape received: [None, 32, 1]
我知道我的尺寸有问题。我不确定如何进行。
LSTM
层需要输入:形状为 [batch, timesteps, feature]
的 3D 张量
示例代码片段
import tensorflow as tf
inputs = tf.random.normal([32, 10, 8])
lstm = tf.keras.layers.LSTM(4)
output = lstm(inputs)
print(output.shape)
tf.keras.layers.TimeDistributed
期望输入:形状为 (batch, time, ...)
的输入张量
工作示例代码
inputs = tf.keras.Input(shape=(10, 128, 128, 3))
conv_2d_layer = tf.keras.layers.Conv2D(64, (3, 3))
outputs = tf.keras.layers.TimeDistributed(conv_2d_layer)(inputs)
outputs.shape
我正在开发一个关键字检测器,它根据类似于此处显示的语音命令列表处理音频输入和 returns 音频的 class:https://www.tensorflow.org/tutorials/audio/simple_audio
我希望能够处理多帧音频,而不是只处理 1 秒的音频作为输入,比如 5 个时间步长 10 毫秒,然后将它们输入机器学习模型。
本质上,这相当于在我的网络之上添加了一个 TimeDistributed
层。
我想做的第二件事是在密集层之前添加一个 LSTM 层,将我的隐藏层映射到输出 classes.
我的问题:我怎样才能有效地更改下面的代码以添加一个采用多个时间步长的 TimeDistributed
层和一个 LSTM 层。
开始代码:
model = models.Sequential([
layers.Input(shape=input_shape),
preprocessing.Resizing(32, 32),
norm_layer,
layers.Conv2D(32, 3, activation='relu'),
layers.Conv2D(64, 3, activation='relu'),
layers.MaxPooling2D(),
layers.Dropout(0.25),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dropout(0.5),
layers.Dense(num_labels),
])
模型摘要:
Input shape: (124, 129, 1)
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
resizing (Resizing) (None, 32, 32, 1) 0
_________________________________________________________________
normalization (Normalization (None, 32, 32, 1) 3
_________________________________________________________________
conv2d (Conv2D) (None, 30, 30, 32) 320
_________________________________________________________________
conv2d_1 (Conv2D) (None, 28, 28, 64) 18496
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 14, 14, 64) 0
_________________________________________________________________
dropout (Dropout) (None, 14, 14, 64) 0
_________________________________________________________________
flatten (Flatten) (None, 12544) 0
_________________________________________________________________
dense (Dense) (None, 128) 1605760
_________________________________________________________________
dropout_1 (Dropout) (None, 128) 0
_________________________________________________________________
dense_1 (Dense) (None, 8) 1032
=================================================================
Total params: 1,625,611
Trainable params: 1,625,608
Non-trainable params: 3
_________________________________________________________________
尝试 1:添加 LSTM 层
model = models.Sequential([
layers.Input(shape=input_shape),
preprocessing.Resizing(32, 32),
norm_layer,
layers.Conv2D(32, 3, activation='relu'),
layers.Conv2D(64, 3, activation='relu'),
layers.MaxPooling2D(),
layers.Dropout(0.25),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dropout(0.5),
layers.Flatten(),
layers.LSTM(32, activation='relu', input_shape=(1,128,98)),
layers.Dense(num_labels),
])
错误:ValueError: Input 0 of layer lstm_5 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [None, 128]
尝试 2:添加 TimeDistributed 层:
model = models.Sequential([
layers.Input(shape=input_shape),
preprocessing.Resizing(32, 32),
norm_layer,
TimeDistributed(layers.Conv2D(32, 3, activation='relu'), input_shape=(None, 32, 32, 1)),
TimeDistributed(layers.Conv2D(64, 3, activation='relu'), input_shape=(None, 30, 30, 1)),
TimeDistributed(layers.MaxPooling2D()),
TimeDistributed(layers.Dropout(0.25)),
TimeDistributed(layers.Flatten()),
TimeDistributed(layers.Dense(128, activation='relu')),
TimeDistributed(layers.Dropout(0.5)),
TimeDistributed(layers.Flatten()),
layers.Dense(num_labels),
])
错误:ValueError: Input 0 of layer conv2d_43 is incompatible with the layer: : expected min_ndim=4, found ndim=3. Full shape received: [None, 32, 1]
我知道我的尺寸有问题。我不确定如何进行。
LSTM
层需要输入:形状为 [batch, timesteps, feature]
的 3D 张量
示例代码片段
import tensorflow as tf
inputs = tf.random.normal([32, 10, 8])
lstm = tf.keras.layers.LSTM(4)
output = lstm(inputs)
print(output.shape)
tf.keras.layers.TimeDistributed
期望输入:形状为 (batch, time, ...)
工作示例代码
inputs = tf.keras.Input(shape=(10, 128, 128, 3))
conv_2d_layer = tf.keras.layers.Conv2D(64, (3, 3))
outputs = tf.keras.layers.TimeDistributed(conv_2d_layer)(inputs)
outputs.shape