将 TFDS 数据集与 Keras Functional API 结合使用

Using TFDS datasets with Keras Functional API

我正在尝试使用默认的 TFDS 数据集之一训练使用 Keras Functional API 制作的神经网络,但我不断收到与数据集相关的错误。

我的想法是做一个对象检测模型,但对于初稿,我只是尝试做普通图像分类(img、标签)。输入将是 (256x256x3) 图像。输入层如下:

img_inputs = keras.Input(shape=[256, 256, 3], name='image')

然后我尝试使用 TFDS 中可用的 voc2007 数据集(一个非常古老且轻便的版本以使其更快)

(train_ds, test_ds), ds_info = tfds.load(
'voc/2007',
split=['train', 'test'],
data_dir="/content/drive/My Drive",
with_info=True)

然后对数据进行如下预处理:

def resize_and_normalize_img(example):
  """Normalizes images: `uint8` -> `float32`."""
  example['image'] = tf.image.resize(example['image'], [256, 256])
  example['image'] = tf.cast(example['image'], tf.float32) / 255.
  return example

def reduce_for_classification(example):
        for key in ['image/filename', 'labels_no_difficult', 'objects']:
            example.pop(key)
        return example

train_ds_class = train_ds.map(reduce_for_classification, num_parallel_calls=tf.data.AUTOTUNE)
train_ds_class = train_ds_class.map(resize_and_normalize_img, num_parallel_calls=tf.data.AUTOTUNE)
train_ds_class = train_ds_class.cache()
train_ds_class = train_ds_class.shuffle(ds_info.splits['train'].num_examples)
train_ds_class = train_ds_class.batch(64)
train_ds_class = train_ds_class.prefetch(tf.data.AUTOTUNE)

test_ds_class = test_ds.map(reduce_for_classification, num_parallel_calls=tf.data.AUTOTUNE)
test_ds_class = test_ds_class.map(resize_and_normalize_img, num_parallel_calls=tf.data.AUTOTUNE)
test_ds_class = test_ds_class.batch(64)
test_ds_class = test_ds_class.cache()
test_ds_class = test_ds_class.prefetch(tf.data.AUTOTUNE)

然后像这样拟合模型:

epochs=8
history = model.fit(
  x=train_x, y =trian_y,
  validation_data=test_ds_clas,
  epochs=epochs
)

执行此操作后,我收到一条错误消息,提示我的模型需要形状为 [None, 256, 256, 3] 的输入,但它得到的输入形状为 [256, 256, 3] ].

我认为这是与标签有关的问题。在我从 tfds 获得的数据的类似字典格式的额外键中遇到问题之前,我试图删除除标签之外的所有内容,但现在我仍然得到这个并且不知道如何前进。我觉得在使用 tfds 准备好数据集后,它应该可以准备好输入模型了,在查看了文档、教程和堆栈溢出后我还没有找到答案,我希望遇到这个问题的人能有所帮助。

更新: 为了提供更多信息,这是我正在使用的模型:

TLDR: 图像输入 256x256x3,一系列卷积和残差块,最后以平均池化、全连接层和 softmax 结束,结果是 (None, 1280) 张量。使用稀疏分类交叉熵作为损失和准确度作为度量。

img_inputs = keras.Input(shape=[256, 256, 3], name='image')

# first convolution
conv_first = tf.keras.layers.Conv2D(32, kernel_size=(3, 3), padding='same', name='first_conv')
x = conv_first(img_inputs)

# Second convolution
x = tf.keras.layers.Conv2D(64, kernel_size=(3, 3), strides=2, padding='same', name='second_conv')(x)

# First residual block
res = tf.keras.layers.Conv2D(32, kernel_size=(1, 1), name='res_block1_conv1')(x)
res = tf.keras.layers.Conv2D(64, kernel_size=(3, 3), padding='same', name='res_block1_conv2')(res)
x = x + res

# Convolution after First residual block
x = tf.keras.layers.Conv2D(128, kernel_size=3, strides=2, padding='same', name='first_post_res_conv')(x)

# Second residual Block
for i in range(2):
  shortcut = x
  res = tf.keras.layers.Conv2D(64, kernel_size=1, name=f'res_block2_conv1_loop{i}')(x)
  res = tf.keras.layers.Conv2D(128, kernel_size=3, padding='same', name=f'res_block2_conv2_loop{i}')(res)

  x = res + shortcut

# Convolution after Second residual block
x = tf.keras.layers.Conv2D(256, 3, strides=2, padding='same', name='second_post_res_conv')(x)

# Third residual Block
for i in range(8):
  shortcut = x
  res = tf.keras.layers.Conv2D(128, kernel_size=1, name=f'res_block3_conv1_loop{i}')(x)
  res = tf.keras.layers.Conv2D(256, kernel_size=3, padding='same', name=f'res_block3_conv2_loop{i}')(res)

  x = res + shortcut

# Convolution after Third residual block
x = tf.keras.layers.Conv2D(512, 3, strides=2, padding='same', name='third_post_res_conv')(x)

# Fourth residual Block
for i in range(8):
  shortcut = x
  res = tf.keras.layers.Conv2D(256, kernel_size=1, name=f'res_block4_conv1_loop{i}')(x)
  res = tf.keras.layers.Conv2D(512, kernel_size=3, padding='same', name=f'res_block4_conv2_loop{i}')(res)

  x = res + shortcut

# Convolution after Fourth residual block
x = tf.keras.layers.Conv2D(1024, 3, strides=2, padding='same', name='fourth_post_res_conv')(x)

# Fifth residual Block
for i in range(4):
  shortcut = x
  res = tf.keras.layers.Conv2D(512, kernel_size=1, name=f'res_block5_conv1_loop{i}')(x)
  res = tf.keras.layers.Conv2D(1024, kernel_size=3, padding='same', name=f'res_block5_conv2_loop{i}')(res)

  x = res + shortcut

# Global avg pooling
x = tf.keras.layers.GlobalAveragePooling2D(name='average_pooling')(x)

# Fully connected layer
x = tf.keras.layers.Dense(1280, name='fully_connected_layer')(x)

# Softmax
end_result = tf.keras.layers.Softmax(name='softmax')(x)

model = tf.keras.Model(inputs=img_inputs, outputs=end_result, name="darknet53")

model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

在尝试了 AloneTogether 提出的解决方案后,我遇到了以下错误(我尝试多次更改 tf.one_hot() 函数中的轴,结果相同):

Node: 'sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits'
logits and labels must have the same first dimension, got logits shape [64,1280] and labels shape [1280]
     [[{{node sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}}]] [Op:__inference_train_function_20172]

这似乎与批处理有关,但不知道具体如何解决。

整个问题似乎确实与标签编码有关,因为当 运行 在没有 tf.reduce_sum() 函数的那一行时,我得到了相同的结果,但是有:

First element had shape [2,20] and element 1 had shape [1,20].

如果我 运行 在没有单热编码行的情况下也一样,我会得到这个错误:

´´´ 节点:'IteratorGetNext' 无法在分量 1 中批处理具有不同形状的张量。第一个元素的形状为 [4],元素 1 的形状为 [1]。 [[{{node IteratorGetNext}}]] [操作:__inference_train_function_18534] ´´´

我认为问题在于每个图像可以属于多个 类,因此我建议 one-hot 对标签进行编码。然后它应该工作。这是一个例子:

import tensorflow as tf
import tensorflow_datasets as tfds 

def resize_and_normalize_img(example):
  """Normalizes images: `uint8` -> `float32`."""
  example['image'] = tf.image.resize(example['image'], [256, 256])
  example['image'] = tf.cast(example['image'], tf.float32) / 255.
  return example['image'], example['labels']

def reduce_for_classification(example):
        for key in ['image/filename', 'labels_no_difficult', 'objects']:
            example.pop(key)
        return example

(train_ds, test_ds), ds_info = tfds.load('voc/2007', split=['train', 'test'], with_info=True)

train_ds_class = train_ds.map(reduce_for_classification, num_parallel_calls=tf.data.AUTOTUNE)
train_ds_class = train_ds_class.map(resize_and_normalize_img, num_parallel_calls=tf.data.AUTOTUNE)
train_ds_class = train_ds_class.map(lambda x, y: (x, tf.reduce_sum(tf.one_hot(y, 20, axis=-1), axis=0)))

train_ds_class = train_ds_class.cache()
train_ds_class = train_ds_class.shuffle(ds_info.splits['train'].num_examples)
train_ds_class = train_ds_class.batch(64)
train_ds_class = train_ds_class.prefetch(tf.data.AUTOTUNE)

inputs = tf.keras.layers.Input(shape=[256, 256, 3], name='image')
x = tf.keras.layers.Flatten()(inputs)
x = tf.keras.layers.Dense(50, activation='relu')(x)
outputs = tf.keras.layers.Dense(20, activation='sigmoid')(x)
model = tf.keras.Model(inputs, outputs)
model.compile(loss='binary_crossentropy', optimizer='adam')

model.fit(train_ds_class, epochs=5)
Epoch 1/5
40/40 [==============================] - 16s 124ms/step - loss: 3.0883
Epoch 2/5
40/40 [==============================] - 5s 115ms/step - loss: 0.9750
Epoch 3/5
40/40 [==============================] - 5s 115ms/step - loss: 0.4578
Epoch 4/5
40/40 [==============================] - 5s 115ms/step - loss: 0.6004
Epoch 5/5
40/40 [==============================] - 5s 115ms/step - loss: 0.3534
<keras.callbacks.History at 0x7f0e59513f50>