Tensorflow:图的不同路径之间tf.gradients

Tensorflow: tf.gradients between different paths of the graph

我正在研究 DDPG 实现,它需要计算一个网络(下图:critic)相对于另一个网络(下图:actor)输出的梯度。我的代码已经在大部分情况下使用了队列而不是 feed dicts,但是对于这个特定部分我还不能这样做:

import tensorflow as tf
tf.reset_default_graph()

states = tf.placeholder(tf.float32, (None,))
actions = tf.placeholder(tf.float32, (None,))

actor = states * 1
critic = states * 1 + actions

grads_indirect = tf.gradients(critic, actions)
grads_direct = tf.gradients(critic, actor)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    act = sess.run(actor, {states: [1.]})
    print(act)  # -> [1.]
    cri = sess.run(critic, {states: [1.], actions: [2.]})
    print(cri)  # -> [3.]
    grad1 = sess.run(grads_indirect, {states: [1.], actions: act})
    print(grad1)  # -> [[1.]]
    grad2 = sess.run(grads_direct, {states: [1.], actions: [2.]})
    print(grad2)  # -> TypeError: Fetch argument has invalid type 'NoneType'

grad1 这里计算梯度 w.r.t。到馈入动作,这些动作之前由 actor 计算。 grad2 应该做同样的事情,但直接在图表内部而不需要反馈动作,而是直接评估 actor。问题是 grads_directNone:

print(grads_direct)  # [None]

我怎样才能做到这一点?是否有我可以使用的专用 "evaluate this tensor" 操作?谢谢!

在您的示例中,您没有使用 actor 来计算 critic,因此梯度为 None。

你应该这样做:

actor = states * 1
critic = actor + actions  # change here

grads_indirect = tf.gradients(critic, actions)
grads_direct = tf.gradients(critic, actor)