如何使用嵌套形状的 tf.data.Dataset.padded_batch？

Question

我正在为每个元素构建一个数据集，其中包含两个形状为 [batch,width,heigh,3] 和 [batch,class] 的张量。为简单起见，假设 class = 5.

你向 dataset.padded_batch(1000,shape) 提供什么形状，以便沿 width/height/3 轴填充图像？

我尝试了以下方法：

tf.TensorShape([[None,None,None,3],[None,5]])
[tf.TensorShape([None,None,None,3]),tf.TensorShape([None,5])]
[[None,None,None,3],[None,5]]
([None,None,None,3],[None,5])
(tf.TensorShape([None,None,None,3]),tf.TensorShape([None,5])‌)

每次引发 TypeError

The docs 状态：

padded_shapes: A nested structure of tf.TensorShape or tf.int64 vector tensor-like objects representing the shape to which the respective component of each input element should be padded prior to batching. Any unknown dimensions (e.g. tf.Dimension(None) in a tf.TensorShape or -1 in a tensor-like object) will be padded to the maximum size of that dimension in each batch.

相关代码：

dataset = tf.data.Dataset.from_generator(generator,tf.float32)
shapes = (tf.TensorShape([None,None,None,3]),tf.TensorShape([None,5]))
batch = dataset.padded_batch(1,shapes)

Answer 1

TensorShape 不接受嵌套列表。 tf.TensorShape([None, None, None, 3, None, 5]) 和 TensorShape(None)（注意没有 []）是合法的。

不过，结合这两个张量对我来说听起来很奇怪。我不确定您要完成什么，但我建议您在不组合不同维度的张量的情况下尝试这样做。

Answer 2

感谢 marry 找到解决方案。事实证明 from_generator 中的类型必须与条目中的张量数量相匹配。

新代码：

dataset = tf.data.Dataset.from_generator(generator,(tf.float32,tf.float32))
shapes = (tf.TensorShape([None,None,None,3]),tf.TensorShape([None,5]))
batch = dataset.padded_batch(1,shapes)

如何使用嵌套形状的 tf.data.Dataset.padded_batch？

How to use tf.data.Dataset.padded_batch with a nested shape?

python

tensorflow

tensorflow-datasets