Python Tensorflow 形状不匹配 (WaveNet)
Python Tensorflow Shape Mismatch (WaveNet)
我正在尝试 运行 WaveNet,它在 https://github.com/mjpyeon/wavenet-classifier/blob/master/WaveNetClassifier.py 中指定。
我的部分代码如下:
def residual_block(self, x, i):
tanh_out = Conv1D(self.n_filters, self.kernel_size, dilation_rate=self.kernel_size ** i,
padding='causal', name='dilated_conv_%d_tanh' % (self.kernel_size ** i),
activation='tanh')(x)
sigm_out = Conv1D(self.n_filters, self.kernel_size, dilation_rate=self.kernel_size ** i,
padding='causal', name='dilated_conv_%d_sigm' % (self.kernel_size ** i),
activation='sigmoid')(x)
# 'z' multiplies the 2 Conv1D layer (one with tanh activation function & the other with
# sigmoid activation function)
z = Multiply(name='gated_activation_%d' % (i))([tanh_out, sigm_out])
# Skip Layer includes 'z' going through Conv1D layer
skip = Conv1D(self.n_filters, 1, name='skip_%d' % (i))(z)
# Residual Layer adds the output from the skip layer & the original input
res = Add(name='residual_block_%d' % (i))([skip, x])
return res, skip
def train_dataset(self, X_train, y_train, validation_data=None, epochs=100):
with tf.device('/GPU:0'):
# 1. Input Layer
x = Input(shape=self.input_shape, name='original_input')
# 2. Creating a Skip Connection using specified no. of residual blocks
skip_connections = []
out = Conv1D(self.n_filters, 2, dilation_rate=1, padding='causal',
name='dilated_conv_1')(x)
for i in range(1, self.dilation_depth + 1):
# The output from a residual block is fed back to the next residual block
out, skip = self.residual_block(out, i)
skip_connections.append(skip)
# 3. ReLU Activation Function
out = Add(name='skip_connections')(skip_connections)
out = Activation('relu')(out)
# 4. Series of Conv1D and AveragePooling1D Layer
out = Conv1D(self.n_filters, 80, strides=1, padding='same', name='conv_5ms',
activation='relu')(out)
out = AveragePooling1D(80, padding='same', name='downsample_to_200Hz')(out)
out = Conv1D(self.n_filters, 100, padding='same', activation='relu',
name='conv_500ms')(out)
out = Conv1D(self.output_shape[0], 100, padding='same', activation='relu',
name='conv_500ms_target_shape')(out)
out = AveragePooling1D(100, padding='same', name='downsample_to_2Hz')(out)
out = Conv1D(self.output_shape[0], (int) (self.input_shape[0] / 8000),
padding='same', name='final_conv')(out)
out = AveragePooling1D((int) (self.input_shape[0] / 8000), name='final_pooling')(out)
# 5. Reshaping into output dimension & Going through activation function
out = Reshape(self.output_shape)(out)
out = Activation('sigmoid')(out)
print(out.shape)
model = Model(x, out)
model.summary()
# Compiling the Model
model.compile('adam', 'binary_crossentropy',
metrics=[tf.keras.metrics.BinaryAccuracy(threshold=0.7)])
# Early Stopping
callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=10)
history = model.fit(X_train, y_train, shuffle=True, epochs=epochs, batch_size=32,
validation_data=validation_data, callbacks=callback)
return history
这里,self.input_shape=X_train.shape
和self.output_shape=(11,)
它成功打印出模型的摘要,但输出以下错误:
ValueError: Input 0 is incompatible with layer model_1: expected shape=(None, 19296, 110250), found shape=(32, 110250)
然而,我的X_train
的形状是(19296, 110250)
。
我试图弄清楚为什么 X_train
已从 (19296, 110250)
重塑为 (32, 110250)
,但找不到。
(19296 是歌曲的数量,110250 是使用 Python Librosa 库处理的采样率为 22050 的 5 秒长度音频文件)
我的代码有什么问题?
提前致谢!
您的数据缺少维度。 Conv1D
层需要输入形状 (timesteps, features)
。您似乎只有时间步长或功能。所以也许可以尝试这样的事情:
import tensorflow as tf
sample = 1
x_train = tf.random.normal((sample, 110250))
option1 = tf.expand_dims(x_train, axis=-1)
tf.print('expand_dims -->',option1.shape)
shape = tf.shape(x_train)
option2 = tf.reshape(x_train, (tf.shape(x_train)[0], 5, 22050))
tf.print('reshape -->',option2.shape)
expand_dims --> TensorShape([1, 110250, 1])
reshape --> TensorShape([1, 5, 22050])
请注意,我只使用了一个样本,但我想你明白了。
我正在尝试 运行 WaveNet,它在 https://github.com/mjpyeon/wavenet-classifier/blob/master/WaveNetClassifier.py 中指定。
我的部分代码如下:
def residual_block(self, x, i):
tanh_out = Conv1D(self.n_filters, self.kernel_size, dilation_rate=self.kernel_size ** i,
padding='causal', name='dilated_conv_%d_tanh' % (self.kernel_size ** i),
activation='tanh')(x)
sigm_out = Conv1D(self.n_filters, self.kernel_size, dilation_rate=self.kernel_size ** i,
padding='causal', name='dilated_conv_%d_sigm' % (self.kernel_size ** i),
activation='sigmoid')(x)
# 'z' multiplies the 2 Conv1D layer (one with tanh activation function & the other with
# sigmoid activation function)
z = Multiply(name='gated_activation_%d' % (i))([tanh_out, sigm_out])
# Skip Layer includes 'z' going through Conv1D layer
skip = Conv1D(self.n_filters, 1, name='skip_%d' % (i))(z)
# Residual Layer adds the output from the skip layer & the original input
res = Add(name='residual_block_%d' % (i))([skip, x])
return res, skip
def train_dataset(self, X_train, y_train, validation_data=None, epochs=100):
with tf.device('/GPU:0'):
# 1. Input Layer
x = Input(shape=self.input_shape, name='original_input')
# 2. Creating a Skip Connection using specified no. of residual blocks
skip_connections = []
out = Conv1D(self.n_filters, 2, dilation_rate=1, padding='causal',
name='dilated_conv_1')(x)
for i in range(1, self.dilation_depth + 1):
# The output from a residual block is fed back to the next residual block
out, skip = self.residual_block(out, i)
skip_connections.append(skip)
# 3. ReLU Activation Function
out = Add(name='skip_connections')(skip_connections)
out = Activation('relu')(out)
# 4. Series of Conv1D and AveragePooling1D Layer
out = Conv1D(self.n_filters, 80, strides=1, padding='same', name='conv_5ms',
activation='relu')(out)
out = AveragePooling1D(80, padding='same', name='downsample_to_200Hz')(out)
out = Conv1D(self.n_filters, 100, padding='same', activation='relu',
name='conv_500ms')(out)
out = Conv1D(self.output_shape[0], 100, padding='same', activation='relu',
name='conv_500ms_target_shape')(out)
out = AveragePooling1D(100, padding='same', name='downsample_to_2Hz')(out)
out = Conv1D(self.output_shape[0], (int) (self.input_shape[0] / 8000),
padding='same', name='final_conv')(out)
out = AveragePooling1D((int) (self.input_shape[0] / 8000), name='final_pooling')(out)
# 5. Reshaping into output dimension & Going through activation function
out = Reshape(self.output_shape)(out)
out = Activation('sigmoid')(out)
print(out.shape)
model = Model(x, out)
model.summary()
# Compiling the Model
model.compile('adam', 'binary_crossentropy',
metrics=[tf.keras.metrics.BinaryAccuracy(threshold=0.7)])
# Early Stopping
callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=10)
history = model.fit(X_train, y_train, shuffle=True, epochs=epochs, batch_size=32,
validation_data=validation_data, callbacks=callback)
return history
这里,self.input_shape=X_train.shape
和self.output_shape=(11,)
它成功打印出模型的摘要,但输出以下错误:
ValueError: Input 0 is incompatible with layer model_1: expected shape=(None, 19296, 110250), found shape=(32, 110250)
然而,我的X_train
的形状是(19296, 110250)
。
我试图弄清楚为什么 X_train
已从 (19296, 110250)
重塑为 (32, 110250)
,但找不到。
(19296 是歌曲的数量,110250 是使用 Python Librosa 库处理的采样率为 22050 的 5 秒长度音频文件)
我的代码有什么问题? 提前致谢!
您的数据缺少维度。 Conv1D
层需要输入形状 (timesteps, features)
。您似乎只有时间步长或功能。所以也许可以尝试这样的事情:
import tensorflow as tf
sample = 1
x_train = tf.random.normal((sample, 110250))
option1 = tf.expand_dims(x_train, axis=-1)
tf.print('expand_dims -->',option1.shape)
shape = tf.shape(x_train)
option2 = tf.reshape(x_train, (tf.shape(x_train)[0], 5, 22050))
tf.print('reshape -->',option2.shape)
expand_dims --> TensorShape([1, 110250, 1])
reshape --> TensorShape([1, 5, 22050])
请注意,我只使用了一个样本,但我想你明白了。