生成器批量大小和批量大小作为 model.fit() 的参数

Generator batch size and batch size as parameter of model.fit()

您好,我有一个关于我在 generate_train_data 函数中设置的批量大小与设置为 fit() 参数的批量大小之间的区别的问题。如果我想使用 32 的批量大小训练我的数据,并且我已经在我的生成器中设置了默认值,如下所示,我是否需要在我的 model.fit() 函数期间再次设置?

我如何导入数据集:

BATCH_SIZE = 32
IMG_SIZE = (224, 224)

train_dataset = image_dataset_from_directory(data_dir,
                                              shuffle=True,
                                              label_mode = 'categorical',
                                              validation_split = 0.2,
                                              batch_size=BATCH_SIZE,
                                              seed = 42,
                                              subset = "training",
                                              image_size=IMG_SIZE
                                             )

validation_dataset = image_dataset_from_directory(data_dir,
                                              shuffle=True,
                                              label_mode = 'categorical',
                                              validation_split = 0.2,
                                              batch_size=BATCH_SIZE,
                                              seed = 42,
                                              subset = "validation",
                                              image_size=IMG_SIZE
                                             )

train_size = int(0.7 * 54305 / 32)
test_dataset = train_dataset.skip(train_size)
train_dataset = train_dataset.take(train_size)

我如何定义生成器函数:

def generate_train_data(batch_size=32):
  x_batch = np.zeros((batch_size, 224, 224, 3))
  y_batch = np.zeros((batch_size,))
  c_batch = np.zeros((batch_size,))

  for image_batch, labels_batch in train_dataset:
      batch_size = len(image_batch)
      for i in range(0, batch_size):
        classes = decode_predictions(labels_batch)
        specie_position = specie_list.index(classes[i][0].split('___')[0])
        disease_position = disease_list.index(classes[i][0].split('___')[1])

        image = image_batch[i]
        x_batch[i] = image
        y_batch[i] = specie_position
        c_batch[i] = disease_position

        yield x_batch, [y_batch, c_batch]

下面是我的 model.fit() 函数

import itertools
train_gen = itertools.cycle(generate_train_data())
test_gen = itertools.cycle(generate_validation_data())

_ = model.fit(
    train_gen, 
    validation_data = test_gen,
    steps_per_epoch=steps_per_epoch,
    validation_steps=val_steps,
    epochs=100,
    # batch_size=32, ## Do I need to set this? 
    callbacks=keras_callbacks,
)

您不必这样做,您可以在此处看到:https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit

batch_size arg 是整数或 None。每次梯度更新的样本数。如果未指定,batch_size 将默认为 32。如果您的数据是数据集、生成器或 keras.utils.Sequence 实例(因为它们生成批次)的形式,请不要指定 batch_size。