为什么基于 Dense 层的模型比基于 Conv2D 的模型提供更好的结果?

Why does a model based on Dense layers gives better results than one based on Conv2D?

在Tensorflow中,基于Dense layers are better than a model based on equivalent Conv2D层训练模型的结果。

结果:

  1. 使用密集:损失:16.1930 - mae:2.5369 - mse:16.1930
  2. 使用 Conv2D:损失:83.7851 - mae:6.5585 - mse:83.7851

这应该是预料之中的,还是我们做错了什么?

我们使用的代码如下(改编自here):

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

import numpy as np
import pandas as pd
import sys

model_type = int(sys.argv[1]) # 0: Dense, Else: Conv2D

verbose = 0

# load data & normalize

(train_features, train_labels), (test_features, test_labels) = keras.datasets.boston_housing.load_data()

train_mean = np.mean(train_features, axis=0)
train_std = np.std(train_features, axis=0)
train_features_norm = (train_features - train_mean) / train_std
test_features_norm = (test_features - train_mean) / train_std

train_labels_norm = train_labels
test_labels_norm = test_labels

input_height = train_features_norm.shape[1]

# model

if model_type == 0:
    model = keras.Sequential([
        layers.InputLayer(input_shape=(input_height)),
        layers.Dense(20, activation='relu'),
        layers.Dense(1)])

else:
    train_features_norm = np.reshape(train_features_norm, (-1, input_height, 1))
    test_features_norm = np.reshape(test_features_norm, (-1, input_height, 1))
    
    model = keras.Sequential([
        layers.InputLayer(input_shape=(input_height, 1, 1)),
        layers.Conv2D(20, (input_height, 1), activation='relu'),
        layers.Conv2D(1, (1, 1))]) # replacing this layer with Dense(1) gives the same results
    
model.compile(
    optimizer=tf.optimizers.Adam(),
    loss='mse',
    metrics=['mae', 'mse'])

model.summary()

# training

early_stop = keras.callbacks.EarlyStopping(
    monitor='val_loss',
    patience=50)

history = model.fit(
    train_features_norm,
    train_labels_norm,
    epochs=1000,
    verbose=verbose,
    validation_split=0.1)

# results

hist = pd.DataFrame(history.history)
hist['epoch'] = history.epoch
print(hist)

rmse_final = np.sqrt(float(hist['val_mse'].tail(1)))
print('Final Root Mean Square Error on validation set: {}'.format(round(rmse_final, 3)))

# compare how the model perfoms on the test dataset
mse, _, _ = model.evaluate(test_features_norm, test_labels_norm)
rmse = np.sqrt(mse)
print('Root Mean Square Error on test set: {}'.format(round(rmse, 3)))

注意:model_type 可用于 select 基于密集层 (= 0) 的模型,或基于 Conv2D(任何其他值)的模型。


背景

我们有一个不支持密集层的系统 (BeagleBone AI using TIDL)。但是,它确实支持 Conv2D 层,据我们所知,Conv2D 可以配置为等效于 Dense 层。

例如,在具有两个 units/outputs、无偏差和两个输入的密集层中,输出为:

O-输出,I-输入,W-权重

类似地,在具有两个 1x1 输出通道、无偏差、一个 1x2 输入通道和 1x2 内核的 Conv2D 层中,输出为:

O - 输出通道,I - 输入通道,K - 内核权重

这意味着它们在数学上是等价的。但是使用 Dense 层时训练效果更好。

我明白了!您必须重塑输出张量,使其只有两个维度 (batch_size, 1)
我得到这个测试数据评估:loss: 17.9552 - mae: 2.7125 - mse: 17.9552
它比您使用密集层的结果略高,但至少看起来相当。
这是我的模型:

  filters = 20
  model = keras.Sequential([
      layers.InputLayer(input_shape=(input_height,)),

      # first Conv layer
      layers.Reshape((input_height, 1, 1)),
      layers.Conv2D(filters, (input_height, 1), data_format='channels_last', padding='valid'),
      layers.Activation('relu'),
      # second conv layer
      layers.Reshape((filters, 1, 1)),
      layers.Conv2D(1, (filters, 1)),

      # reshape the final result !!!
      layers.Reshape((1,)), 
      ])

这里有两个问题:

  1. 特征的形状 (None, input_height, 1) 与模型输入的形状不匹配 (None, input_height, 1, 1).
  2. 标签的形状 (None, 1) 与模型输出的形状 (None, 1, 1, 1) 不匹配。

其中每一项都会对模型的性能产生影响。两者都需要达到基于 Dense 层的模型的性能水平。

修复(向特征添加额外的暗淡,重塑标签):

if model_type == 0:
    ...

else:
    train_features_norm = np.reshape(train_features_norm, (-1, input_height, 1, 1))
    test_features_norm = np.reshape(test_features_norm, (-1, input_height, 1, 1))

    train_labels_norm = np.reshape(train_labels_norm, (-1, 1, 1, 1))
    test_labels_norm = np.reshape(test_labels_norm, (-1, 1, 1, 1))
    
    ...

Should this be expected or are we doing something wrong?

不,这不是预期的。我不确定原始代码是否可以被认为是错误的。我的期望(因为它没有像往常一样抱怨形状不匹配)是因为“缺失”的尺寸是 1,所以这并不重要。嗯,他们有。

谢谢@elbe。您的回答是我意识到上述问题的关键。