TensorFlow:tf.train.batch 批次完成训练后会自动加载下一批次吗?
TensorFlow: does tf.train.batch automatically load the next batch when the batch has finished training?
例如,在我创建了我的操作后,通过操作和 运行 操作提供了批数据,tf.train.batch 是否会自动将另一批数据提供给会话?
我问这个是因为 tf.train.batch 具有 allow_smaller_final_batch
的属性,这使得最终批次的加载大小可能小于指定的批次大小。这是否意味着即使没有循环,下一批也可以自动喂食?从教程代码我很困惑。当我加载单个批次时,我实际上得到了形状为 [batch_size、高度、宽度、num_channels] 的单个批次大小,但是 documentation says it Creates batches of tensors in tensors.
Also, when I read the tutorial code in the tf-slim walkthrough tutorial,其中有一个名为 load_batch,返回的张量只有3个:images, images_raw, labels
。文档中解释的 'batches' 数据在哪里?
感谢您的帮助。
... does tf.train.batch automatically feeds in another batch of data to the session?
没有。没有什么是自动发生的。您必须再次调用 sess.run(...)
才能加载新的批次。
Does this mean even without a loop, the next batch could be automatically fed?
没有。 tf.train.batch(..)
将始终加载 batch_size
张量。例如,如果您有 100 张图像和一个 batch_size=30
,那么您将有 3*30 个批次,因为您可以在输入队列从头开始之前调用 sess.run(batch)
三次(如果 [=16 则停止) =]).这意味着您错过了 100-3*30=10
个训练样本。如果你不想错过它们,你可以做 tf.train.batch(..., allow_smaller_final_batch=True)
所以现在在输入队列重新启动之前你将有 3x 30-sample-batches 和 1x 10-sample-batch。
让我也用一个代码示例来详细说明:
queue = tf.train.string_input_producer(filenames,
num_epochs=1) # only iterate through all samples in dataset once
reader = tf.TFRecordReader() # or any reader you need
_, example = reader.read(queue)
image, label = your_conversion_fn(example)
# batch will now load up to 100 image-label-pairs on sess.run(...)
# most tf ops are tuned to work on batches
# this is faster and also gives better result on e.g. gradient calculation
batch = tf.train.batch([image, label], batch_size=100)
with tf.Session() as sess:
# "boilerplate" code
sess.run([
tf.local_variables_initializer(),
tf.global_variables_initializer(),
])
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
try:
# in most cases coord.should_stop() will return True
# when there are no more samples to read
# if num_epochs=0 then it will run for ever
while not coord.should_stop():
# will start reading, working data from input queue
# and "fetch" the results of the computation graph
# into raw_images and raw_labels
raw_images, raw_labels = sess.run([images, labels])
finally:
coord.request_stop()
coord.join(threads)
每次要加载下一批时,都需要调用 sess.run 并将批传递给它。请参阅下面的代码。
img = [0,1,2,3,4,5,6,7,8]
lbl = [0,1,2,3,4,5,6,7,8]
images = tf.convert_to_tensor(img)
labels = tf.convert_to_tensor(lbl)
input_queue = tf.train.slice_input_producer([images,labels])
sliced_img = input_queue[0]
sliced_lbl = input_queue[1]
img_batch, lbl_batch = tf.train.batch([sliced_img,sliced_lbl], batch_size=3)
with tf.Session() as sess:
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
for i in range(0,3): #batch size
image_batch,label_batch = sess.run([img_batch,lbl_batch ])
print(image_batch, label_batch)
coord.request_stop()
coord.join(threads)
答案是这样的:
[4,1,8] [4,1,8]
[2,3,7] [2,3,7]
[2,6,8] [2,6,8]
我对https://github.com/tensorflow/models/blob/master/research/slim/slim_walkthrough.ipynb and bodokaiser answer from the above post. Please note that this is from the evaluation scrip on https://github.com/tensorflow/models/tree/master/research/slim、eval_image_classifier.py的代码进行了修改。 eval_image_classifier.py 代码最重要的修改是将 num_epochs=1 添加到 DatasetDataProvider 行。这样,所有图像都将被访问一次以进行推理。
provider = slim.dataset_data_provider.DatasetDataProvider(
dataset,
shuffle=False,
common_queue_capacity=2 * FLAGS.batch_size,
common_queue_min=FLAGS.batch_size, num_epochs=1)
[image, label] = provider.get(['image', 'label'])
images, labels = tf.train.batch(
[image, label],
batch_size=FLAGS.batch_size,
num_threads=FLAGS.num_preprocessing_threads,
capacity=1 * FLAGS.batch_size)
with tf.Session() as sess:
sess.run([tf.local_variables_initializer(),
tf.global_variables_initializer(),])
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
try:
while not coord.should_stop():
np_image, np_label = sess.run([images, labels])
except:
coord.request_stop()
coord.join(threads)
例如,在我创建了我的操作后,通过操作和 运行 操作提供了批数据,tf.train.batch 是否会自动将另一批数据提供给会话?
我问这个是因为 tf.train.batch 具有 allow_smaller_final_batch
的属性,这使得最终批次的加载大小可能小于指定的批次大小。这是否意味着即使没有循环,下一批也可以自动喂食?从教程代码我很困惑。当我加载单个批次时,我实际上得到了形状为 [batch_size、高度、宽度、num_channels] 的单个批次大小,但是 documentation says it Creates batches of tensors in tensors.
Also, when I read the tutorial code in the tf-slim walkthrough tutorial,其中有一个名为 load_batch,返回的张量只有3个:images, images_raw, labels
。文档中解释的 'batches' 数据在哪里?
感谢您的帮助。
... does tf.train.batch automatically feeds in another batch of data to the session?
没有。没有什么是自动发生的。您必须再次调用 sess.run(...)
才能加载新的批次。
Does this mean even without a loop, the next batch could be automatically fed?
没有。 tf.train.batch(..)
将始终加载 batch_size
张量。例如,如果您有 100 张图像和一个 batch_size=30
,那么您将有 3*30 个批次,因为您可以在输入队列从头开始之前调用 sess.run(batch)
三次(如果 [=16 则停止) =]).这意味着您错过了 100-3*30=10
个训练样本。如果你不想错过它们,你可以做 tf.train.batch(..., allow_smaller_final_batch=True)
所以现在在输入队列重新启动之前你将有 3x 30-sample-batches 和 1x 10-sample-batch。
让我也用一个代码示例来详细说明:
queue = tf.train.string_input_producer(filenames,
num_epochs=1) # only iterate through all samples in dataset once
reader = tf.TFRecordReader() # or any reader you need
_, example = reader.read(queue)
image, label = your_conversion_fn(example)
# batch will now load up to 100 image-label-pairs on sess.run(...)
# most tf ops are tuned to work on batches
# this is faster and also gives better result on e.g. gradient calculation
batch = tf.train.batch([image, label], batch_size=100)
with tf.Session() as sess:
# "boilerplate" code
sess.run([
tf.local_variables_initializer(),
tf.global_variables_initializer(),
])
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
try:
# in most cases coord.should_stop() will return True
# when there are no more samples to read
# if num_epochs=0 then it will run for ever
while not coord.should_stop():
# will start reading, working data from input queue
# and "fetch" the results of the computation graph
# into raw_images and raw_labels
raw_images, raw_labels = sess.run([images, labels])
finally:
coord.request_stop()
coord.join(threads)
每次要加载下一批时,都需要调用 sess.run 并将批传递给它。请参阅下面的代码。
img = [0,1,2,3,4,5,6,7,8]
lbl = [0,1,2,3,4,5,6,7,8]
images = tf.convert_to_tensor(img)
labels = tf.convert_to_tensor(lbl)
input_queue = tf.train.slice_input_producer([images,labels])
sliced_img = input_queue[0]
sliced_lbl = input_queue[1]
img_batch, lbl_batch = tf.train.batch([sliced_img,sliced_lbl], batch_size=3)
with tf.Session() as sess:
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
for i in range(0,3): #batch size
image_batch,label_batch = sess.run([img_batch,lbl_batch ])
print(image_batch, label_batch)
coord.request_stop()
coord.join(threads)
答案是这样的:
[4,1,8] [4,1,8]
[2,3,7] [2,3,7]
[2,6,8] [2,6,8]
我对https://github.com/tensorflow/models/blob/master/research/slim/slim_walkthrough.ipynb and bodokaiser answer from the above post. Please note that this is from the evaluation scrip on https://github.com/tensorflow/models/tree/master/research/slim、eval_image_classifier.py的代码进行了修改。 eval_image_classifier.py 代码最重要的修改是将 num_epochs=1 添加到 DatasetDataProvider 行。这样,所有图像都将被访问一次以进行推理。
provider = slim.dataset_data_provider.DatasetDataProvider(
dataset,
shuffle=False,
common_queue_capacity=2 * FLAGS.batch_size,
common_queue_min=FLAGS.batch_size, num_epochs=1)
[image, label] = provider.get(['image', 'label'])
images, labels = tf.train.batch(
[image, label],
batch_size=FLAGS.batch_size,
num_threads=FLAGS.num_preprocessing_threads,
capacity=1 * FLAGS.batch_size)
with tf.Session() as sess:
sess.run([tf.local_variables_initializer(),
tf.global_variables_initializer(),])
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
try:
while not coord.should_stop():
np_image, np_label = sess.run([images, labels])
except:
coord.request_stop()
coord.join(threads)