在 TensorFlow 队列中访问文件名和预处理数据

Question

我有一个图像文件目录和相应的信息文件（包含每个图像的一些进一步信息）。它看起来像这样：

data/images/001.png

data/info/001.txt

data/info2/001.txt

我可以使用...加载图像

filename_queue = tf.train.string_input_producer(filenames)
reader = tf.WholeFileReader()
_, value = reader.read(filename_queue)`

...但我还需要相关文件中的信息（实际上我有一些函数使用图像文件名作为输入，预处理信息路径，预处理数据，以及 return 一个 numpy大批）。但是现在，我很困惑如何通过队列传递它。看来我需要访问队列中的文件名才能在每一步调用我的函数。

谢谢！

Answer 1

您可以同时使用 tf.py_func() to contain your python/numpy pre-processing from the filenames, also knowing that reader.read() returns key（文件名）和 value（文件内容），例如

def my_preprocessing_from_filename(filename):
    # This is your pre-processing, e.g.:
    image_name = os.path.splitext(os.path.basename(str(filename)))[0]
    image_info_path = os.path.join("data/info", "{}.txt".format(image_name))
    image_info = numpy.loadtxt(image_info_path, dtype=numpy.int64)
    # ... or whatever you do to load/process the info
    return image_info

filename_queue = tf.train.string_input_producer(filenames)
reader = tf.WholeFileReader()
key, value = reader.read(filename_queue)
my_image_info  = tf.py_func(my_preprocessing_from_filename, [key], tf.int64)
# ...

注意： 根据您的预处理方式，您可以考虑将其移植到 Tensorflow 操作中，使用 TF string_ops 方法从图片，例如：

def my_tf_preprocessing_from_filename(filename):
    # Get basename:
    image_name = tf.string_split(filename, delimiter='/').values[-1]
    # Remove ext (supposing no other "." in name):  
    image_name = tf.string_split(filename, delimiter='.').values[0]    
    image_info = tf.reduce_join(["data/info", image_name, ".txt"])
    _, info_value = reader.read(filename_queue)
    # ... further pre-process your info
    return info_value

filename_queue = tf.train.string_input_producer(filenames)
reader = tf.WholeFileReader()
key, value = reader.read(filename_queue)
my_image_info = my_tf_preprocessing_from_filename([key])

在 TensorFlow 队列中访问文件名和预处理数据

Access to filenames and preprocess data inside queue in TensorFlow

python

tensorflow

tensorflow-datasets