生成器批量大小和批量大小作为 model.fit() 的参数
Generator batch size and batch size as parameter of model.fit()
您好,我有一个关于我在 generate_train_data 函数中设置的批量大小与设置为 fit() 参数的批量大小之间的区别的问题。如果我想使用 32 的批量大小训练我的数据,并且我已经在我的生成器中设置了默认值,如下所示,我是否需要在我的 model.fit() 函数期间再次设置?
我如何导入数据集:
BATCH_SIZE = 32
IMG_SIZE = (224, 224)
train_dataset = image_dataset_from_directory(data_dir,
shuffle=True,
label_mode = 'categorical',
validation_split = 0.2,
batch_size=BATCH_SIZE,
seed = 42,
subset = "training",
image_size=IMG_SIZE
)
validation_dataset = image_dataset_from_directory(data_dir,
shuffle=True,
label_mode = 'categorical',
validation_split = 0.2,
batch_size=BATCH_SIZE,
seed = 42,
subset = "validation",
image_size=IMG_SIZE
)
train_size = int(0.7 * 54305 / 32)
test_dataset = train_dataset.skip(train_size)
train_dataset = train_dataset.take(train_size)
我如何定义生成器函数:
def generate_train_data(batch_size=32):
x_batch = np.zeros((batch_size, 224, 224, 3))
y_batch = np.zeros((batch_size,))
c_batch = np.zeros((batch_size,))
for image_batch, labels_batch in train_dataset:
batch_size = len(image_batch)
for i in range(0, batch_size):
classes = decode_predictions(labels_batch)
specie_position = specie_list.index(classes[i][0].split('___')[0])
disease_position = disease_list.index(classes[i][0].split('___')[1])
image = image_batch[i]
x_batch[i] = image
y_batch[i] = specie_position
c_batch[i] = disease_position
yield x_batch, [y_batch, c_batch]
下面是我的 model.fit() 函数
import itertools
train_gen = itertools.cycle(generate_train_data())
test_gen = itertools.cycle(generate_validation_data())
_ = model.fit(
train_gen,
validation_data = test_gen,
steps_per_epoch=steps_per_epoch,
validation_steps=val_steps,
epochs=100,
# batch_size=32, ## Do I need to set this?
callbacks=keras_callbacks,
)
您不必这样做,您可以在此处看到:https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit
batch_size arg 是整数或 None。每次梯度更新的样本数。如果未指定,batch_size 将默认为 32。如果您的数据是数据集、生成器或 keras.utils.Sequence 实例(因为它们生成批次)的形式,请不要指定 batch_size。
您好,我有一个关于我在 generate_train_data 函数中设置的批量大小与设置为 fit() 参数的批量大小之间的区别的问题。如果我想使用 32 的批量大小训练我的数据,并且我已经在我的生成器中设置了默认值,如下所示,我是否需要在我的 model.fit() 函数期间再次设置?
我如何导入数据集:
BATCH_SIZE = 32
IMG_SIZE = (224, 224)
train_dataset = image_dataset_from_directory(data_dir,
shuffle=True,
label_mode = 'categorical',
validation_split = 0.2,
batch_size=BATCH_SIZE,
seed = 42,
subset = "training",
image_size=IMG_SIZE
)
validation_dataset = image_dataset_from_directory(data_dir,
shuffle=True,
label_mode = 'categorical',
validation_split = 0.2,
batch_size=BATCH_SIZE,
seed = 42,
subset = "validation",
image_size=IMG_SIZE
)
train_size = int(0.7 * 54305 / 32)
test_dataset = train_dataset.skip(train_size)
train_dataset = train_dataset.take(train_size)
我如何定义生成器函数:
def generate_train_data(batch_size=32):
x_batch = np.zeros((batch_size, 224, 224, 3))
y_batch = np.zeros((batch_size,))
c_batch = np.zeros((batch_size,))
for image_batch, labels_batch in train_dataset:
batch_size = len(image_batch)
for i in range(0, batch_size):
classes = decode_predictions(labels_batch)
specie_position = specie_list.index(classes[i][0].split('___')[0])
disease_position = disease_list.index(classes[i][0].split('___')[1])
image = image_batch[i]
x_batch[i] = image
y_batch[i] = specie_position
c_batch[i] = disease_position
yield x_batch, [y_batch, c_batch]
下面是我的 model.fit() 函数
import itertools
train_gen = itertools.cycle(generate_train_data())
test_gen = itertools.cycle(generate_validation_data())
_ = model.fit(
train_gen,
validation_data = test_gen,
steps_per_epoch=steps_per_epoch,
validation_steps=val_steps,
epochs=100,
# batch_size=32, ## Do I need to set this?
callbacks=keras_callbacks,
)
您不必这样做,您可以在此处看到:https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit
batch_size arg 是整数或 None。每次梯度更新的样本数。如果未指定,batch_size 将默认为 32。如果您的数据是数据集、生成器或 keras.utils.Sequence 实例(因为它们生成批次)的形式,请不要指定 batch_size。