在 Tensorflow 中通过数据集 API 处理批次时，在字典中执行索引查找的推荐方法是什么？

Question

我目前正在将现有代码重构到更新的 TF 数据集 API。在我们当前的流程中，我们使用产品 ID 和分类 ID 填充标准 python 字典。

现在我已经将 images/paths 移至 TF 数据集，然后使用 tf.string_split 从文件名本身提取各种信息。其中之一是 product_id。此时 product_id 是一个 tf 张量，我无法使用我们之前的方法通过“if product_id in products_to_class”执行查找因为我现在有一个张量，我无法通过标准字典执行搜索。

所以我正在使用这个项目来学习如何提高性能。所以我想知道在处理 tf 数据集 API 批处理时，"best/recommended" 方法在这里采用什么。我是将 product_id 转换为字符串并仅通过上面的当前 if 检查执行查找，还是现在将 products_to_class 字典转换为另一个数据结构（例如另一个数据集）并使用执行查找整个张量？任何建议将不胜感激。

我目前拥有的小例子是：

prod_to_class = {'12345': 0, '67890': 1}

#Below logic is in a mapped function used on a TF.Dataset
def _parse_fn(filename, label)
  core_file = tf.string_split([filename], '\').values[-1]
  product_id = tf.string_split([core_file], ".").values[0]

  #unable to perform below because product_id is now a tensor and
  #products_to_class is a python dictionary
  if product_id in products_to_class:
    label = products_to_class[product_id]

Answer 1

执行此操作的内置 TensorFlow 机制是使用 tf.contrib.lookup table。例如，如果您有一个要映射到密集整数的字符串键列表，则可以在 _parse_fn() 之外定义以下内容：

# This constructor creates a lookup table that implicitly maps each string in the
# argument to its index in the list (e.g. '67890' -> 1).
products_to_class = tf.contrib.lookup.index_table_from_tensor(['12345', '67890'])

...然后在 _parse_fn().

中使用 products_to_class.lookup()

def _parse_fn(filename, label):
  core_file = tf.string_split([filename], '\').values[-1]
  product_id = tf.string_split([core_file], ".").values[0]

  # Returns a `tf.Tensor` that corresponds to the value associated with 
  # `product_id` in the `products_to_class` table.
  label = products_to_class.lookup(product_id)

  # ...

请注意，这对您的程序施加了两个额外的限制：

您必须使用 Dataset.make_initializable_iterator() 而不是 Dataset.make_one_shot_iterator()。
您必须在开始使用输入管道中的元素之前调用 sess.run(tf.tables_initializer())。

如果您使用 input_fn 中的高级 tf.estimator API 和 return tf.data.Dataset，这两个都会为您处理。

在 Tensorflow 中通过数据集 API 处理批次时，在字典中执行索引查找的推荐方法是什么？

When working with batches via the Dataset API in Tensorflow what is the recommended way to perform index lookups in dictionary?

tensorflow

tensorflow-datasets