从映射到张量流数据集的函数返回的意外形状

Question

我试图填充一个数据集，该数据集包含第一维中不同长度的张量。长度是 13 以内的任何数字，我想在第一维的前面填充零。当我将它应用于张量时，该函数似乎可以正常工作，但数据集并没有像我预期的那样 return (13,128) 的形状。我得到了 (None, None).

的形状

代码如下：

print(train_dataset_filtered.element_spec, '\n')

def pad_seq(eng, ger):
    n = 13 - tf.shape(eng)[0]
    paddings = tf.concat(([[n,0]], [[0,0]]), axis=0)
    return tf.pad(eng, paddings), ger

print(pad_seq(tf.ones((4,128)), tf.ones((14,))), '\n')

print(train_dataset_filtered.map(pad_seq).element_spec)

输出结果如下：

(TensorSpec(shape=(None, 128), dtype=tf.float32, name=None), TensorSpec(shape=(14,), dtype=tf.int32, name=None)) 

(<tf.Tensor: id=402, shape=(13, 128), dtype=float32, numpy=
array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [1., 1., 1., ..., 1., 1., 1.],
       [1., 1., 1., ..., 1., 1., 1.],
       [1., 1., 1., ..., 1., 1., 1.]], dtype=float32)>, <tf.Tensor: id=388, shape=(14,), dtype=float32, numpy=
array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
      dtype=float32)>) 

(TensorSpec(shape=(None, None), dtype=tf.float32, name=None), TensorSpec(shape=(14,), dtype=tf.int32, name=None))

Answer 1

当您使用 concat() 时，n = 13 - tf.shape(eng)[0] 的形状会导致此行为。

在你的例子中是tf.Tensor(9, shape=(), dtype=int32)。所以最好的办法就是使用stack()。更改您的代码：

def pad_seq(eng, ger):
    n = 13 - tf.shape(eng)[0]
    paddings = tf.stack(([[n,0], [0,0]]))
    return (tf.pad(eng, paddings), ger)

我还注意到您没有接受 SO 中的任何答案。请看What should I do when someone answers my question?

从映射到张量流数据集的函数返回的意外形状

Unexpected shape returned from function mapped to tensorflow dataset

tensorflow2.0

tensorflow-datasets