创建一个 int 列表功能以在 tensorflow 中另存为 tfrecord？

Question

如何从列表创建 tensorflow 记录？

从 documentation here it seems possible. There's also this example 开始，他们使用来自 numpy 的 .tostring() 将 numpy 数组转换为字节数组。但是，当我尝试传入时：

labels = np.asarray([[1,2,3],[4,5,6]])
...
example = tf.train.Example(features=tf.train.Features(feature={
    'height': _int64_feature(rows),
    'width': _int64_feature(cols),
    'depth': _int64_feature(depth),
    'label': _int64_feature(labels[index]),
    'image_raw': _bytes_feature(image_raw)}))
writer.write(example.SerializeToString())

我收到错误：

TypeError: array([1, 2, 3]) has type type 'numpy.ndarray', but expected one of: (type 'int', type 'long')

这并不能帮助我弄清楚如何将整数列表存储到 tfrecord 中。我试过查看文档。

Answer 1

据我了解，您想在 tfrecord 中存储一个整数列表。可以根据文档存储打包的 BytesList、FloatList 或 Int64List 中的一个 https://github.com/tensorflow/tensorflow/blob/r0.9/tensorflow/core/example/example.proto

如果您查看示例，他们正在使用一个函数 _int64_feature，他们在其中创建传递给该函数的值列表

    def _int64_feature(value):
      return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))

在您的情况下，您试图将列表作为值传递给函数 _int64_feature，因此它会出错。

所以请改用它，这将解决您存储 int 值列表的错误或根据您的需要修改上述函数。

'label': tf.train.Feature(int64_list=tf.train.Int64List(value=labels[index]))

希望对您有所帮助

Answer 2

经过一段时间的研究和进一步查看文档后，我找到了自己的答案。在以上函数中以示例代码为基础：

def _int64_feature(value):
  return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
...
'label': _int64_feature(labels[index]),

labels[index] 正在作为 [value] 转换为列表，因此您有 [np.array([1,2,3])] 导致错误。

上面的转换在示例中是必需的，因为 tf.train.Int64List() 需要一个列表或 numpy 数组，并且该示例传入一个整数，因此他们将其类型转换为列表。
在例子中是这样的

label = [1,2,3,4]
...
'label': _int64_feature(label[index]) 

tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
#Where value = [1] in this case

如果你想传入一个列表，这样做

labels = np.asarray([[1,2,3],[4,5,6]])
...
def _int64_feature(value):
  return tf.train.Feature(int64_list=tf.train.Int64List(value=value))
...
'label': _int64_feature(labels[index]),

我可能会提出拉取请求，因为我发现 tf.train 的原始文档。功能几乎不存在。

TL;DR

将列表或 numpy 数组传递给 tf.train.Int64List() 但不是列表列表或 numpy 数组列表。

Answer 3

Int64List、BytesList 和 FloatList 需要 iterator of the underlying elements（repeated 字段）。在函数 _int64_feature 的情况下，您使用列表作为迭代器。

当您传递一个标量时，您的 _int64_feature 会在其中创建一个包含一个 int64 元素的数组（完全符合预期）。但是当你传递一个 ndarray 时，你创建了一个 ndarray 的列表并将它传递给一个需要 int64 列表的函数。

所以只需从您的函数中删除数组的构造：int64_list=tf.train.Int64List(value=value)

Answer 4

一种是将value=[value]改成value=value，但是如果你要传一个列表的列表或者numpy.arrays的一个列表，这是很常见的情况如果你想保存一个分子的所有原子的 x, y, z 坐标，实际上你可以先把你的数组放平，然后使用 value = value。例如，

    array_1 = np.array([[1,2,3],[2,3,4]]).ravel()

如果你想在读取 tfrecord 文件或训练时把它放回去，你可以使用 reshape

    array_1 = array_1.reshape([2,3])

创建一个 int 列表功能以在 tensorflow 中另存为 tfrecord？

Create an int list feature to save as tfrecord in tensorflow?

python

tensorflow