为什么基于 Dense 层的模型比基于 Conv2D 的模型提供更好的结果?
Why does a model based on Dense layers gives better results than one based on Conv2D?
在Tensorflow中,基于Dense layers are better than a model based on equivalent Conv2D层训练模型的结果。
结果:
- 使用密集:损失:16.1930 - mae:2.5369 - mse:16.1930
- 使用 Conv2D:损失:83.7851 - mae:6.5585 - mse:83.7851
这应该是预料之中的,还是我们做错了什么?
我们使用的代码如下(改编自here):
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
import pandas as pd
import sys
model_type = int(sys.argv[1]) # 0: Dense, Else: Conv2D
verbose = 0
# load data & normalize
(train_features, train_labels), (test_features, test_labels) = keras.datasets.boston_housing.load_data()
train_mean = np.mean(train_features, axis=0)
train_std = np.std(train_features, axis=0)
train_features_norm = (train_features - train_mean) / train_std
test_features_norm = (test_features - train_mean) / train_std
train_labels_norm = train_labels
test_labels_norm = test_labels
input_height = train_features_norm.shape[1]
# model
if model_type == 0:
model = keras.Sequential([
layers.InputLayer(input_shape=(input_height)),
layers.Dense(20, activation='relu'),
layers.Dense(1)])
else:
train_features_norm = np.reshape(train_features_norm, (-1, input_height, 1))
test_features_norm = np.reshape(test_features_norm, (-1, input_height, 1))
model = keras.Sequential([
layers.InputLayer(input_shape=(input_height, 1, 1)),
layers.Conv2D(20, (input_height, 1), activation='relu'),
layers.Conv2D(1, (1, 1))]) # replacing this layer with Dense(1) gives the same results
model.compile(
optimizer=tf.optimizers.Adam(),
loss='mse',
metrics=['mae', 'mse'])
model.summary()
# training
early_stop = keras.callbacks.EarlyStopping(
monitor='val_loss',
patience=50)
history = model.fit(
train_features_norm,
train_labels_norm,
epochs=1000,
verbose=verbose,
validation_split=0.1)
# results
hist = pd.DataFrame(history.history)
hist['epoch'] = history.epoch
print(hist)
rmse_final = np.sqrt(float(hist['val_mse'].tail(1)))
print('Final Root Mean Square Error on validation set: {}'.format(round(rmse_final, 3)))
# compare how the model perfoms on the test dataset
mse, _, _ = model.evaluate(test_features_norm, test_labels_norm)
rmse = np.sqrt(mse)
print('Root Mean Square Error on test set: {}'.format(round(rmse, 3)))
注意:model_type 可用于 select 基于密集层 (= 0) 的模型,或基于 Conv2D(任何其他值)的模型。
背景
我们有一个不支持密集层的系统 (BeagleBone AI using TIDL)。但是,它确实支持 Conv2D 层,据我们所知,Conv2D 可以配置为等效于 Dense 层。
例如,在具有两个 units/outputs、无偏差和两个输入的密集层中,输出为:
- O1 = W11 * I1 + W12 * I2
- O2 = W21 * I1 + W22 * I2
O-输出,I-输入,W-权重
类似地,在具有两个 1x1 输出通道、无偏差、一个 1x2 输入通道和 1x2 内核的 Conv2D 层中,输出为:
- O1 = K11 * I11 + K12 * I12
- O2 = K21 * I11 + K22 * I12
O - 输出通道,I - 输入通道,K - 内核权重
这意味着它们在数学上是等价的。但是使用 Dense 层时训练效果更好。
我明白了!您必须重塑输出张量,使其只有两个维度 (batch_size, 1)
我得到这个测试数据评估:loss: 17.9552 - mae: 2.7125 - mse: 17.9552
它比您使用密集层的结果略高,但至少看起来相当。
这是我的模型:
filters = 20
model = keras.Sequential([
layers.InputLayer(input_shape=(input_height,)),
# first Conv layer
layers.Reshape((input_height, 1, 1)),
layers.Conv2D(filters, (input_height, 1), data_format='channels_last', padding='valid'),
layers.Activation('relu'),
# second conv layer
layers.Reshape((filters, 1, 1)),
layers.Conv2D(1, (filters, 1)),
# reshape the final result !!!
layers.Reshape((1,)),
])
这里有两个问题:
- 特征的形状 (None, input_height, 1) 与模型输入的形状不匹配 (None, input_height, 1, 1).
- 标签的形状 (None, 1) 与模型输出的形状 (None, 1, 1, 1) 不匹配。
其中每一项都会对模型的性能产生影响。两者都需要达到基于 Dense 层的模型的性能水平。
修复(向特征添加额外的暗淡,重塑标签):
if model_type == 0:
...
else:
train_features_norm = np.reshape(train_features_norm, (-1, input_height, 1, 1))
test_features_norm = np.reshape(test_features_norm, (-1, input_height, 1, 1))
train_labels_norm = np.reshape(train_labels_norm, (-1, 1, 1, 1))
test_labels_norm = np.reshape(test_labels_norm, (-1, 1, 1, 1))
...
Should this be expected or are we doing something wrong?
不,这不是预期的。我不确定原始代码是否可以被认为是错误的。我的期望(因为它没有像往常一样抱怨形状不匹配)是因为“缺失”的尺寸是 1,所以这并不重要。嗯,他们有。
谢谢@elbe。您的回答是我意识到上述问题的关键。
在Tensorflow中,基于Dense layers are better than a model based on equivalent Conv2D层训练模型的结果。
结果:
- 使用密集:损失:16.1930 - mae:2.5369 - mse:16.1930
- 使用 Conv2D:损失:83.7851 - mae:6.5585 - mse:83.7851
这应该是预料之中的,还是我们做错了什么?
我们使用的代码如下(改编自here):
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
import pandas as pd
import sys
model_type = int(sys.argv[1]) # 0: Dense, Else: Conv2D
verbose = 0
# load data & normalize
(train_features, train_labels), (test_features, test_labels) = keras.datasets.boston_housing.load_data()
train_mean = np.mean(train_features, axis=0)
train_std = np.std(train_features, axis=0)
train_features_norm = (train_features - train_mean) / train_std
test_features_norm = (test_features - train_mean) / train_std
train_labels_norm = train_labels
test_labels_norm = test_labels
input_height = train_features_norm.shape[1]
# model
if model_type == 0:
model = keras.Sequential([
layers.InputLayer(input_shape=(input_height)),
layers.Dense(20, activation='relu'),
layers.Dense(1)])
else:
train_features_norm = np.reshape(train_features_norm, (-1, input_height, 1))
test_features_norm = np.reshape(test_features_norm, (-1, input_height, 1))
model = keras.Sequential([
layers.InputLayer(input_shape=(input_height, 1, 1)),
layers.Conv2D(20, (input_height, 1), activation='relu'),
layers.Conv2D(1, (1, 1))]) # replacing this layer with Dense(1) gives the same results
model.compile(
optimizer=tf.optimizers.Adam(),
loss='mse',
metrics=['mae', 'mse'])
model.summary()
# training
early_stop = keras.callbacks.EarlyStopping(
monitor='val_loss',
patience=50)
history = model.fit(
train_features_norm,
train_labels_norm,
epochs=1000,
verbose=verbose,
validation_split=0.1)
# results
hist = pd.DataFrame(history.history)
hist['epoch'] = history.epoch
print(hist)
rmse_final = np.sqrt(float(hist['val_mse'].tail(1)))
print('Final Root Mean Square Error on validation set: {}'.format(round(rmse_final, 3)))
# compare how the model perfoms on the test dataset
mse, _, _ = model.evaluate(test_features_norm, test_labels_norm)
rmse = np.sqrt(mse)
print('Root Mean Square Error on test set: {}'.format(round(rmse, 3)))
注意:model_type 可用于 select 基于密集层 (= 0) 的模型,或基于 Conv2D(任何其他值)的模型。
背景
我们有一个不支持密集层的系统 (BeagleBone AI using TIDL)。但是,它确实支持 Conv2D 层,据我们所知,Conv2D 可以配置为等效于 Dense 层。
例如,在具有两个 units/outputs、无偏差和两个输入的密集层中,输出为:
- O1 = W11 * I1 + W12 * I2
- O2 = W21 * I1 + W22 * I2
O-输出,I-输入,W-权重
类似地,在具有两个 1x1 输出通道、无偏差、一个 1x2 输入通道和 1x2 内核的 Conv2D 层中,输出为:
- O1 = K11 * I11 + K12 * I12
- O2 = K21 * I11 + K22 * I12
O - 输出通道,I - 输入通道,K - 内核权重
这意味着它们在数学上是等价的。但是使用 Dense 层时训练效果更好。
我明白了!您必须重塑输出张量,使其只有两个维度 (batch_size, 1)
我得到这个测试数据评估:loss: 17.9552 - mae: 2.7125 - mse: 17.9552
它比您使用密集层的结果略高,但至少看起来相当。
这是我的模型:
filters = 20
model = keras.Sequential([
layers.InputLayer(input_shape=(input_height,)),
# first Conv layer
layers.Reshape((input_height, 1, 1)),
layers.Conv2D(filters, (input_height, 1), data_format='channels_last', padding='valid'),
layers.Activation('relu'),
# second conv layer
layers.Reshape((filters, 1, 1)),
layers.Conv2D(1, (filters, 1)),
# reshape the final result !!!
layers.Reshape((1,)),
])
这里有两个问题:
- 特征的形状 (None, input_height, 1) 与模型输入的形状不匹配 (None, input_height, 1, 1).
- 标签的形状 (None, 1) 与模型输出的形状 (None, 1, 1, 1) 不匹配。
其中每一项都会对模型的性能产生影响。两者都需要达到基于 Dense 层的模型的性能水平。
修复(向特征添加额外的暗淡,重塑标签):
if model_type == 0:
...
else:
train_features_norm = np.reshape(train_features_norm, (-1, input_height, 1, 1))
test_features_norm = np.reshape(test_features_norm, (-1, input_height, 1, 1))
train_labels_norm = np.reshape(train_labels_norm, (-1, 1, 1, 1))
test_labels_norm = np.reshape(test_labels_norm, (-1, 1, 1, 1))
...
Should this be expected or are we doing something wrong?
不,这不是预期的。我不确定原始代码是否可以被认为是错误的。我的期望(因为它没有像往常一样抱怨形状不匹配)是因为“缺失”的尺寸是 1,所以这并不重要。嗯,他们有。
谢谢@elbe。您的回答是我意识到上述问题的关键。