tf.Dataset 不会重复 - WARNING:tensorflow:Your 输入 运行 数据外;中断训练
tf.Dataset will not repeat without - WARNING:tensorflow:Your input ran out of data; interrupting training
使用 Tensorflow 的数据集生成器无需 重复工作。但是,当我使用 repeat 将我的训练数据集从 82,000 加倍到 164,000 以进行额外扩充时,我“运行 数据不足。”
我读到 steps_per_epoch 可以通过允许多个时期对训练数据进行单次传递来“慢煮”模型。这不是我的意图,但即使我传递了少量 steps_per_epoch(这应该会创建这种慢速烹饪模式),TF 也说我 运行 没有数据。
有一种情况,TF 说我很接近(“在这种情况下,120 个批次”)。我已经尝试 plus/minus 这个值,但仍然遇到错误 drop_remainder 设置为 True 以丢弃任何剩余的东西。
错误:
WARNING:tensorflow:Your input ran out of data; interrupting training.
Make sure that your dataset or generator can generate at least
steps_per_epoch * epochs
batches (in this case, 82,000 batches). You
may need to use the repeat() function when building your dataset.
WARNING:tensorflow:Your input ran out of data; interrupting training.
Make sure that your dataset or generator can generate at least
steps_per_epoch * epochs
batches (in this case, 120 batches). You
may need to use the repeat() function when building your dataset.
Parameters
Train Dataset
82,000
Val Dataset
12,000
Test Dataset
12,000
epochs (early stopping usually stops about 30)
100
batch_size
200
**batch_size 对于模型小批量和生成器批量相同
Attempt
steps_per_epoch Value
Error
steps_per_epoch==None
None
"..in this case, 82,000 batches"
steps_per_epoch==train_len//batch_size
820
"..in this case, 82,000 batches"
steps_per_epoch==(train_len//batch_size)-1
819
Training stops halfway "..in this case, 81,900 batches"
steps_per_epoch==(train_len//batch_size)+1
821
Training stops halfway "..in this case, 82,100 batches"
steps_per_epoch==(train_len//batch_size)//2
410
Training seems complete but errors before validation "..in this case, 120 batches"
steps_per_epoch==((train_len//batch_size)//2)-1
409
Same as above:Training seems complete but errors before validation "..in this case, 120 batches"
steps_per_epoch==((train_len//batch_size)//2)+1
411
Training seems complete but errors before validation "..in this case, 41,100 batches"
steps_per_epoch==(train_len//batch_size)*2
1640
Training stops at one quarter "..in this case, 164,000 batches"
steps_per_epoch==20 (arbitrarily small number)
20
Very surprisingly "..in this case, 120 batches"
生成器 - 目标是重复训练集两次:
trainDS = tf.data.Dataset.from_tensor_slices(trainPaths).repeat(2)
train_len = len(trainDS) #used to calc steps_per_epoch
trainDS = (trainDS
.shuffle(train_len)
.map(load_images, num_parallel_calls=AUTOTUNE)
.map(augment, num_parallel_calls=AUTOTUNE)
.cache('train_cache')
.batch(batch_size, drop_remainder=True )
.prefetch(AUTOTUNE)
)
valDS = tf.data.Dataset.from_tensor_slices(valPaths)
valDS = (valDS
.map(load_images, num_parallel_calls=AUTOTUNE)
.cache('val_cache')
.batch(batch_size, drop_remainder=True)
.prefetch(AUTOTUNE)
)
testDS = tf.data.Dataset.from_tensor_slices(testPaths)
testDS = (testDS
.map(load_images, num_parallel_calls=AUTOTUNE)
.cache('test_cache')
.batch(batch_size, drop_remainder=True)
.prefetch(AUTOTUNE)
)
Model.fit()
根据文档- len(train)//batch_size 是默认值
hist= model.fit(trainDS,
epochs=epochs,
batch_size=batch_size,
validation_data=valDS,
steps_per_epoch= <see attempts table above>,
)
编辑:将重复放在有效方法列表的最后。向@AloneTogether 大声喊出从拟合函数中删除批次的提示。
trainDS = tf.data.Dataset.from_tensor_slices(trainPaths)
trainDS = (trainDS
.shuffle(len(trainPaths))
.map(load_images, num_parallel_calls=AUTOTUNE)
.map(augment, num_parallel_calls=AUTOTUNE)
.cache('train_cache')
.batch(batch_size, drop_remainder=True)
.prefetch(AUTOTUNE)
.repeat(2) # <-- put last in the list
)
嗯,也许你不应该在 model.fit(...)
中明确定义 batch_size
和 steps_per_epoch
。关于model.fit(...)
中的batch_size
参数,docs状态:
[...] Do not specify the batch_size if your data is in the form of datasets,
generators, or keras.utils.Sequence instances (since they generate
batches).
这似乎有效:
import tensorflow as tf
x = tf.random.normal((1000, 1))
y = tf.random.normal((1000, 1))
ds = tf.data.Dataset.from_tensor_slices((x, y)).repeat(2)
ds = ds.shuffle(2000).cache('train_cache').batch(15, drop_remainder=True ).prefetch(tf.data.AUTOTUNE)
val_ds = tf.data.Dataset.from_tensor_slices((tf.random.normal((300, 1)), tf.random.normal((300, 1))))
val_ds = val_ds.shuffle(300).cache('val_cache').batch(15, drop_remainder=True).prefetch(tf.data.AUTOTUNE)
inputs = tf.keras.layers.Input(shape = (1,))
x = tf.keras.layers.Dense(10, activation='relu')(inputs)
outputs = tf.keras.layers.Dense(1)(x)
model = tf.keras.Model(inputs, outputs)
model.compile(optimizer='adam', loss='mse')
model.fit(ds, validation_data=val_ds, epochs = 5)
Epoch 1/5
133/133 [==============================] - 1s 4ms/step - loss: 1.0355 - val_loss: 1.1205
Epoch 2/5
133/133 [==============================] - 0s 3ms/step - loss: 0.9847 - val_loss: 1.1050
Epoch 3/5
133/133 [==============================] - 0s 3ms/step - loss: 0.9810 - val_loss: 1.0982
Epoch 4/5
133/133 [==============================] - 0s 3ms/step - loss: 0.9792 - val_loss: 1.0937
Epoch 5/5
133/133 [==============================] - 0s 3ms/step - loss: 0.9779 - val_loss: 1.0903
<keras.callbacks.History at 0x7f3acb3e5ed0>
133 * batch_size
= 1995 --> 剩余部分已删除。
使用 Tensorflow 的数据集生成器无需 重复工作。但是,当我使用 repeat 将我的训练数据集从 82,000 加倍到 164,000 以进行额外扩充时,我“运行 数据不足。”
我读到 steps_per_epoch 可以通过允许多个时期对训练数据进行单次传递来“慢煮”模型。这不是我的意图,但即使我传递了少量 steps_per_epoch(这应该会创建这种慢速烹饪模式),TF 也说我 运行 没有数据。
有一种情况,TF 说我很接近(“在这种情况下,120 个批次”)。我已经尝试 plus/minus 这个值,但仍然遇到错误 drop_remainder 设置为 True 以丢弃任何剩余的东西。
错误:
WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least
steps_per_epoch * epochs
batches (in this case, 82,000 batches). You may need to use the repeat() function when building your dataset. WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at leaststeps_per_epoch * epochs
batches (in this case, 120 batches). You may need to use the repeat() function when building your dataset.
Parameters | |
---|---|
Train Dataset | 82,000 |
Val Dataset | 12,000 |
Test Dataset | 12,000 |
epochs (early stopping usually stops about 30) | 100 |
batch_size | 200 |
**batch_size 对于模型小批量和生成器批量相同
Attempt | steps_per_epoch Value | Error |
---|---|---|
steps_per_epoch==None | None | "..in this case, 82,000 batches" |
steps_per_epoch==train_len//batch_size | 820 | "..in this case, 82,000 batches" |
steps_per_epoch==(train_len//batch_size)-1 | 819 | Training stops halfway "..in this case, 81,900 batches" |
steps_per_epoch==(train_len//batch_size)+1 | 821 | Training stops halfway "..in this case, 82,100 batches" |
steps_per_epoch==(train_len//batch_size)//2 | 410 | Training seems complete but errors before validation "..in this case, 120 batches" |
steps_per_epoch==((train_len//batch_size)//2)-1 | 409 | Same as above:Training seems complete but errors before validation "..in this case, 120 batches" |
steps_per_epoch==((train_len//batch_size)//2)+1 | 411 | Training seems complete but errors before validation "..in this case, 41,100 batches" |
steps_per_epoch==(train_len//batch_size)*2 | 1640 | Training stops at one quarter "..in this case, 164,000 batches" |
steps_per_epoch==20 (arbitrarily small number) | 20 | Very surprisingly "..in this case, 120 batches" |
生成器 - 目标是重复训练集两次:
trainDS = tf.data.Dataset.from_tensor_slices(trainPaths).repeat(2)
train_len = len(trainDS) #used to calc steps_per_epoch
trainDS = (trainDS
.shuffle(train_len)
.map(load_images, num_parallel_calls=AUTOTUNE)
.map(augment, num_parallel_calls=AUTOTUNE)
.cache('train_cache')
.batch(batch_size, drop_remainder=True )
.prefetch(AUTOTUNE)
)
valDS = tf.data.Dataset.from_tensor_slices(valPaths)
valDS = (valDS
.map(load_images, num_parallel_calls=AUTOTUNE)
.cache('val_cache')
.batch(batch_size, drop_remainder=True)
.prefetch(AUTOTUNE)
)
testDS = tf.data.Dataset.from_tensor_slices(testPaths)
testDS = (testDS
.map(load_images, num_parallel_calls=AUTOTUNE)
.cache('test_cache')
.batch(batch_size, drop_remainder=True)
.prefetch(AUTOTUNE)
)
Model.fit() 根据文档- len(train)//batch_size 是默认值
hist= model.fit(trainDS,
epochs=epochs,
batch_size=batch_size,
validation_data=valDS,
steps_per_epoch= <see attempts table above>,
)
编辑:将重复放在有效方法列表的最后。向@AloneTogether 大声喊出从拟合函数中删除批次的提示。
trainDS = tf.data.Dataset.from_tensor_slices(trainPaths)
trainDS = (trainDS
.shuffle(len(trainPaths))
.map(load_images, num_parallel_calls=AUTOTUNE)
.map(augment, num_parallel_calls=AUTOTUNE)
.cache('train_cache')
.batch(batch_size, drop_remainder=True)
.prefetch(AUTOTUNE)
.repeat(2) # <-- put last in the list
)
嗯,也许你不应该在 model.fit(...)
中明确定义 batch_size
和 steps_per_epoch
。关于model.fit(...)
中的batch_size
参数,docs状态:
[...] Do not specify the batch_size if your data is in the form of datasets, generators, or keras.utils.Sequence instances (since they generate batches).
这似乎有效:
import tensorflow as tf
x = tf.random.normal((1000, 1))
y = tf.random.normal((1000, 1))
ds = tf.data.Dataset.from_tensor_slices((x, y)).repeat(2)
ds = ds.shuffle(2000).cache('train_cache').batch(15, drop_remainder=True ).prefetch(tf.data.AUTOTUNE)
val_ds = tf.data.Dataset.from_tensor_slices((tf.random.normal((300, 1)), tf.random.normal((300, 1))))
val_ds = val_ds.shuffle(300).cache('val_cache').batch(15, drop_remainder=True).prefetch(tf.data.AUTOTUNE)
inputs = tf.keras.layers.Input(shape = (1,))
x = tf.keras.layers.Dense(10, activation='relu')(inputs)
outputs = tf.keras.layers.Dense(1)(x)
model = tf.keras.Model(inputs, outputs)
model.compile(optimizer='adam', loss='mse')
model.fit(ds, validation_data=val_ds, epochs = 5)
Epoch 1/5
133/133 [==============================] - 1s 4ms/step - loss: 1.0355 - val_loss: 1.1205
Epoch 2/5
133/133 [==============================] - 0s 3ms/step - loss: 0.9847 - val_loss: 1.1050
Epoch 3/5
133/133 [==============================] - 0s 3ms/step - loss: 0.9810 - val_loss: 1.0982
Epoch 4/5
133/133 [==============================] - 0s 3ms/step - loss: 0.9792 - val_loss: 1.0937
Epoch 5/5
133/133 [==============================] - 0s 3ms/step - loss: 0.9779 - val_loss: 1.0903
<keras.callbacks.History at 0x7f3acb3e5ed0>
133 * batch_size
= 1995 --> 剩余部分已删除。