为什么基于 Dense 层的模型比基于 Conv2D 的模型提供更好的结果？

Question

在Tensorflow中，基于Dense layers are better than a model based on equivalent Conv2D层训练模型的结果。

结果：

使用密集：损失：16.1930 - mae：2.5369 - mse：16.1930
使用 Conv2D：损失：83.7851 - mae：6.5585 - mse：83.7851

这应该是预料之中的，还是我们做错了什么？

我们使用的代码如下（改编自here）：

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

import numpy as np
import pandas as pd
import sys

model_type = int(sys.argv[1]) # 0: Dense, Else: Conv2D

verbose = 0

# load data & normalize

(train_features, train_labels), (test_features, test_labels) = keras.datasets.boston_housing.load_data()

train_mean = np.mean(train_features, axis=0)
train_std = np.std(train_features, axis=0)
train_features_norm = (train_features - train_mean) / train_std
test_features_norm = (test_features - train_mean) / train_std

train_labels_norm = train_labels
test_labels_norm = test_labels

input_height = train_features_norm.shape[1]

# model

if model_type == 0:
    model = keras.Sequential([
        layers.InputLayer(input_shape=(input_height)),
        layers.Dense(20, activation='relu'),
        layers.Dense(1)])

else:
    train_features_norm = np.reshape(train_features_norm, (-1, input_height, 1))
    test_features_norm = np.reshape(test_features_norm, (-1, input_height, 1))
    
    model = keras.Sequential([
        layers.InputLayer(input_shape=(input_height, 1, 1)),
        layers.Conv2D(20, (input_height, 1), activation='relu'),
        layers.Conv2D(1, (1, 1))]) # replacing this layer with Dense(1) gives the same results
    
model.compile(
    optimizer=tf.optimizers.Adam(),
    loss='mse',
    metrics=['mae', 'mse'])

model.summary()

# training

early_stop = keras.callbacks.EarlyStopping(
    monitor='val_loss',
    patience=50)

history = model.fit(
    train_features_norm,
    train_labels_norm,
    epochs=1000,
    verbose=verbose,
    validation_split=0.1)

# results

hist = pd.DataFrame(history.history)
hist['epoch'] = history.epoch
print(hist)

rmse_final = np.sqrt(float(hist['val_mse'].tail(1)))
print('Final Root Mean Square Error on validation set: {}'.format(round(rmse_final, 3)))

# compare how the model perfoms on the test dataset
mse, _, _ = model.evaluate(test_features_norm, test_labels_norm)
rmse = np.sqrt(mse)
print('Root Mean Square Error on test set: {}'.format(round(rmse, 3)))

注意：model_type 可用于 select 基于密集层 (= 0) 的模型，或基于 Conv2D（任何其他值）的模型。

背景

我们有一个不支持密集层的系统 (BeagleBone AI using TIDL)。但是，它确实支持 Conv2D 层，据我们所知，Conv2D 可以配置为等效于 Dense 层。

例如，在具有两个 units/outputs、无偏差和两个输入的密集层中，输出为：

O1 = W11 * I1 + W12 * I2
O2 = W21 * I1 + W22 * I2

O-输出，I-输入，W-权重

类似地，在具有两个 1x1 输出通道、无偏差、一个 1x2 输入通道和 1x2 内核的 Conv2D 层中，输出为：

O1 = K11 * I11 + K12 * I12
O2 = K21 * I11 + K22 * I12

O - 输出通道，I - 输入通道，K - 内核权重

这意味着它们在数学上是等价的。但是使用 Dense 层时训练效果更好。

Answer 1

我明白了！您必须重塑输出张量，使其只有两个维度 (batch_size, 1)
我得到这个测试数据评估：loss: 17.9552 - mae: 2.7125 - mse: 17.9552
它比您使用密集层的结果略高，但至少看起来相当。
这是我的模型：

  filters = 20
  model = keras.Sequential([
      layers.InputLayer(input_shape=(input_height,)),

      # first Conv layer
      layers.Reshape((input_height, 1, 1)),
      layers.Conv2D(filters, (input_height, 1), data_format='channels_last', padding='valid'),
      layers.Activation('relu'),
      # second conv layer
      layers.Reshape((filters, 1, 1)),
      layers.Conv2D(1, (filters, 1)),

      # reshape the final result !!!
      layers.Reshape((1,)), 
      ])

Answer 2

这里有两个问题：

特征的形状 (None, input_height, 1) 与模型输入的形状不匹配 (None, input_height, 1, 1).
标签的形状 (None, 1) 与模型输出的形状 (None, 1, 1, 1) 不匹配。

其中每一项都会对模型的性能产生影响。两者都需要达到基于 Dense 层的模型的性能水平。

修复（向特征添加额外的暗淡，重塑标签）：

if model_type == 0:
    ...

else:
    train_features_norm = np.reshape(train_features_norm, (-1, input_height, 1, 1))
    test_features_norm = np.reshape(test_features_norm, (-1, input_height, 1, 1))

    train_labels_norm = np.reshape(train_labels_norm, (-1, 1, 1, 1))
    test_labels_norm = np.reshape(test_labels_norm, (-1, 1, 1, 1))
    
    ...

Should this be expected or are we doing something wrong?

不，这不是预期的。我不确定原始代码是否可以被认为是错误的。我的期望（因为它没有像往常一样抱怨形状不匹配）是因为“缺失”的尺寸是 1，所以这并不重要。嗯，他们有。

谢谢@elbe。您的回答是我意识到上述问题的关键。

为什么基于 Dense 层的模型比基于 Conv2D 的模型提供更好的结果？

Why does a model based on Dense layers gives better results than one based on Conv2D?

tensorflow

machine-learning

conv-neural-network