TensorFlow 如何计算 tf.train.GradientDescentOptimizer 的梯度?

How does TensorFlow calculate the gradients for the tf.train.GradientDescentOptimizer?

我想了解 TensorFlow 如何计算 tf.train.GradientDescentOptimizer 的梯度。

如果我正确理解 TensorFlow 白皮书中的第 4.1 节,它会通过向 TensorFlow 图添加节点来计算基于反向传播的梯度,这些节点计算原始图中节点的推导。

When TensorFlow needs to compute the gradient of a tensor C with respect to some tensor I on which C depends, it first finds the path in the computation graph from I to C. Then it backtracks from C to I, and for each operation on the backward path it adds a node to the TensorFlow graph, composing the partial gradients along the backwards path using the chain rule. The newly added node computes the “gradient function” for the corresponding operation in the forward path. A gradient function may be registered by any operation. This function takes as input not only the partial gradients computed already along the backward path, but also, optionally, the inputs and outputs of the forward operation. [Section 4.1 TensorFlow whitepaper]

问题1:每个TensorFlow节点是否有第二个节点实现,代表原始TensorFlow节点的推导?

问题 2:有没有一种方法可以可视化将哪些派生节点添加到图形(或任何日志)中?

每个节点都有对应的计算反向传播值的方法(在Python中使用类似@ops.RegisterGradient("Sum")的方法注册)

您可以使用方法

可视化图形

但是,请注意,由于自动微分代码适用于各种条件,因此它创建的图表非常复杂并且看起来不是很有用。一个简单的梯度计算有 10 个 ops 节点并不少见,可以用 1-2 个节点