在 Keras 中循环 model.fit 是否合乎逻辑？

Question

为了不运行内存不足，在 Keras 中执行以下操作合乎逻辑吗？

for path in ['xaa', 'xab', 'xac', 'xad']:
    x_train, y_train = prepare_data(path)
    model.fit(x_train, y_train, batch_size=50, epochs=20, shuffle=True)

model.save('model')

Answer 1

是，但如果每次迭代都生成单个批次，则更喜欢 model.train_on_batch。这消除了 fit 带来的一些开销。

您也可以尝试创建一个生成器并使用 model.fit_generator():

def dataGenerator(pathes, batch_size):

    while True: #generators for keras must be infinite
        for path in pathes:
            x_train, y_train = prepare_data(path)

            totalSamps = x_train.shape[0]
            batches = totalSamps // batch_size

            if totalSamps % batch_size > 0:
                batches+=1

            for batch in range(batches):
                section = slice(batch*batch_size,(batch+1)*batch_size)
                yield (x_train[section], y_train[section])

创建和使用：

gen = dataGenerator(['xaa', 'xab', 'xac', 'xad'], 50)
model.fit_generator(gen,
                    steps_per_epoch = expectedTotalNumberOfYieldsForOneEpoch
                    epochs = epochs)

Answer 2

我建议在 Github 上查看此 thread。

确实可以考虑使用model.fit()，但这样训练会更稳定：

for epoch in range(20):
    for path in ['xaa', 'xab', 'xac', 'xad']:
        x_train, y_train = prepare_data(path)
        model.fit(x_train, y_train, batch_size=50, epochs=epoch+1, initial_epoch=epoch, shuffle=True)

通过这种方式，您每个时期迭代一次所有数据，而不是在切换之前对部分数据迭代 20 个时期。

如主题中所述，另一种解决方案是开发您自己的数据生成器并将其与 model.fit_generator() 一起使用。

在 Keras 中循环 model.fit 是否合乎逻辑？

Is it logical to loop on model.fit in Keras?

batching

large-data

keras