Tensorflow 重塑张量给出 None 维度

Question

我使用了描述的模型 here on the 0.6.0 branch. The code can be found here。我对链接代码做了一些小改动。

在我的代码中，我创建了两个模型，一个用于训练，一个用于验证，与在 Tensorflow 教程中所做的非常相似。

with tf.variable_scope("model", reuse=None, initializer=initializer):
    m = PTBModel_User(is_training=True, config=config, name='Training model')
with tf.variable_scope("model", reuse=True, initializer=initializer):
    mtest = PTBModel_User(is_training=False, config=config_valid, name='Validation model')

第一个模型，即用于训练的模型，似乎创建得很好，但第二个用于验证的模型却没有。输出得到一个 None 维度！我指的是 linked 代码中的第 134 行：

output = tf.reshape(tf.concat(1, outputs), [-1, size])

我在重塑输出后立即添加了这些行：

output_shape = output.get_shape()
print("Model num_steps:", num_steps)
print("Model batch_size:", batch_size)
print("Output dims", output_shape[0], output_shape[1])

这给了我这个：

Model num_steps: 400
Model batch_size: 1
Output dims Dimension(None) Dimension(650)

此问题仅发生在 'validation model' 上，而不发生在 'training model' 上。对于 'training model' 我得到预期的输出：

Model num_steps: 400
Model batch_size: 2
Output dims Dimension(800) Dimension(650)

（请注意，对于 'validation model'，我使用 batch_size=1 而不是用于训练模型的 batch_size=2）

据我了解，使用 -1 作为 reshape 函数的输入，会自动计算出输出形状！但是为什么我会得到 None？我提供给模型的配置中没有任何内容具有 None 值。

感谢您的所有帮助和提示！

Answer 1

TL;DR: 维度为 None 仅仅意味着形状推断无法确定 output 张量的确切形状，在图形构建时。当您运行图形时，张量将具有适当的 运行-time 形状。

如果您对形状推理的工作原理不感兴趣，请立即停止阅读。

形状推理基于 "shape function" 应用本地规则，该规则采用操作输入的形状并计算（可能不完整）操作输出的形状。要弄清楚为什么 tf.reshape() 给出了不完整的形状，我们必须查看它的输入，然后逆向计算：

tf.reshape() 的 shape 参数包含一个 [-1]，这意味着 "figure the output shape automagically" 基于 tensor 输入的形状。
tensor 输入是 tf.concat() on the same line 的输出。
tf.concat() 的输入由 tf.mul() in BasicLSTMCell.__call__(). The tf.mul() op multiplies the result of a tf.tanh() and a tf.sigmoid() 操作计算。
tf.tanh() op 产生大小为 [?, hidden_size] 的输出，tf.sigmoid() op 产生大小为 [batch_size, hidden_size].

tf.mul() 操作执行 NumPy-style broadcasting。只有大小为 1 的维度才会被广播。考虑三种情况，我们计算 tf.mul(x, y):

如果x的形状为[1, 10]，而y的形状为[5, 10]，则将进行广播，输出形状为[5, 10]。
如果x的形状为[1, 10]，而y的形状为[1, 10]，则不会进行广播，输出的形状为[1, 10] ].
但是，如果 x 的形状为 [1, 10]，而 y 的形状为 [?, 10]，则没有足够的静态信息来判断是否会发生广播（即使我们碰巧知道情况 2 适用于运行时间）。

因此，当 batch_size 为 1 时，tf.mul() op 产生形状为 [?, hidden_size] 的输出；但当 batch_size 大于 1 时，输出形状为 [batch_size, hidden_size].

形状推论失效的地方，可以适当使用Tensor.set_shape()的方法补充信息。这在 BasicLSTMCell 实现中可能很有用，我们知道的比推断输出的形状要多。

Tensorflow 重塑张量给出 None 维度

Tensorflow reshape tensor gives None dimension

tensorflow