"ValueError: No gradients provided for any variable" when scale_diag of MultivariateNormalDiag() is a Constant
"ValueError: No gradients provided for any variable" when scale_diag of MultivariateNormalDiag() is a Constant
以下是一个代码片段,给定一个 state
,它从状态相关分布 (prob_policy
) 生成一个 action
。然后根据损失更新图表的权重,该损失是选择该动作的概率的 -1 倍。在以下示例中,MultivariateNormal 的均值 (mu
) 和协方差 (sigma
) 均为 trainable/learned。
import numpy as np
import tensorflow as tf
import tensorflow_probability as tfp
# make the graph
state = tf.placeholder(tf.float32, (1, 2), name="state")
mu = tf.contrib.layers.fully_connected(
inputs=state,
num_outputs=2,
biases_initializer=tf.ones_initializer)
sigma = tf.contrib.layers.fully_connected(
inputs=state,
num_outputs=2,
biases_initializer=tf.ones_initializer)
sigma = tf.squeeze(sigma)
mu = tf.squeeze(mu)
prob_policy = tfp.distributions.MultivariateNormalDiag(loc=mu, scale_diag=sigma)
action = prob_policy.sample()
picked_action_prob = prob_policy.prob(action)
loss = -tf.log(picked_action_prob)
optimizer = tf.train.AdamOptimizer(learning_rate=0.01)
train_op = optimizer.minimize(loss)
# run the optimizer
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
state_input = np.expand_dims([0.,0.],0)
_, action_loss = sess.run([train_op, loss], { state: state_input })
print(action_loss)
但是,当我替换这一行时
prob_policy = tfp.distributions.MultivariateNormalDiag(loc=mu, scale_diag=sigma)
使用以下行(并注释掉生成 sigma 层并压缩它的行)
prob_policy = tfp.distributions.MultivariateNormalDiag(loc=mu, scale_diag=[1.,1.])
我收到以下错误
ValueError: No gradients provided for any variable, check your graph for ops that do not support gradients, between variables ["<tf.Variable 'fully_connected/weights:0' shape=(2, 2) dtype=float32_ref>", "<tf.Variable 'fully_connected/biases:0' shape=(2,) dtype=float32_ref>"] and loss Tensor("Neg:0", shape=(), dtype=float32).
我不明白为什么会这样。它不应该仍然能够采用 mu
层中权重的梯度吗?为什么使分布常数的协方差突然变得不可微?
系统详细信息:
- 张量流 1.13.1
- Tensorflow 概率 0.6.0
- Python 3.6.8
- MacOS 10.13.6
我不得不改变这一行
action = prob_policy.sample()
到这一行
action = tf.stop_gradient(prob_policy.sample())
如果有人能解释为什么学习协方差的权重使 loc 的权重在损失时可微分,但使协方差成为常数却不能,以及这条线的变化如何促成这一点,我会喜欢解释!谢谢!
我们在 MVNDiag(和 TransformedDistribution 的其他子类)内部为实现可逆性而进行的一些缓存导致了一个问题。
如果您在 .sample() 之后执行 + 0
(作为解决方法),渐变将起作用。
我还建议使用 dist.log_prob(..)
而不是 tf.log(dist.prob(..))
。更好的数字。
import numpy as np
import tensorflow as tf
import tensorflow_probability as tfp
# make the graph
state = tf.placeholder(tf.float32, (1, 2), name="state")
mu = tf.contrib.layers.fully_connected(
inputs=state,
num_outputs=2,
biases_initializer=tf.ones_initializer)
sigma = tf.contrib.layers.fully_connected(
inputs=state,
num_outputs=2,
biases_initializer=tf.ones_initializer)
sigma = tf.squeeze(sigma)
mu = tf.squeeze(mu)
prob_policy = tfp.distributions.MultivariateNormalDiag(loc=mu, scale_diag=[1.,1.])
action = prob_policy.sample() + 0
loss = -prob_policy.log_prob(action)
optimizer = tf.train.AdamOptimizer(learning_rate=0.01)
train_op = optimizer.minimize(loss)
# run the optimizer
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
state_input = np.expand_dims([0.,0.],0)
_, action_loss = sess.run([train_op, loss], { state: state_input })
print(action_loss)
以下是一个代码片段,给定一个 state
,它从状态相关分布 (prob_policy
) 生成一个 action
。然后根据损失更新图表的权重,该损失是选择该动作的概率的 -1 倍。在以下示例中,MultivariateNormal 的均值 (mu
) 和协方差 (sigma
) 均为 trainable/learned。
import numpy as np
import tensorflow as tf
import tensorflow_probability as tfp
# make the graph
state = tf.placeholder(tf.float32, (1, 2), name="state")
mu = tf.contrib.layers.fully_connected(
inputs=state,
num_outputs=2,
biases_initializer=tf.ones_initializer)
sigma = tf.contrib.layers.fully_connected(
inputs=state,
num_outputs=2,
biases_initializer=tf.ones_initializer)
sigma = tf.squeeze(sigma)
mu = tf.squeeze(mu)
prob_policy = tfp.distributions.MultivariateNormalDiag(loc=mu, scale_diag=sigma)
action = prob_policy.sample()
picked_action_prob = prob_policy.prob(action)
loss = -tf.log(picked_action_prob)
optimizer = tf.train.AdamOptimizer(learning_rate=0.01)
train_op = optimizer.minimize(loss)
# run the optimizer
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
state_input = np.expand_dims([0.,0.],0)
_, action_loss = sess.run([train_op, loss], { state: state_input })
print(action_loss)
但是,当我替换这一行时
prob_policy = tfp.distributions.MultivariateNormalDiag(loc=mu, scale_diag=sigma)
使用以下行(并注释掉生成 sigma 层并压缩它的行)
prob_policy = tfp.distributions.MultivariateNormalDiag(loc=mu, scale_diag=[1.,1.])
我收到以下错误
ValueError: No gradients provided for any variable, check your graph for ops that do not support gradients, between variables ["<tf.Variable 'fully_connected/weights:0' shape=(2, 2) dtype=float32_ref>", "<tf.Variable 'fully_connected/biases:0' shape=(2,) dtype=float32_ref>"] and loss Tensor("Neg:0", shape=(), dtype=float32).
我不明白为什么会这样。它不应该仍然能够采用 mu
层中权重的梯度吗?为什么使分布常数的协方差突然变得不可微?
系统详细信息:
- 张量流 1.13.1
- Tensorflow 概率 0.6.0
- Python 3.6.8
- MacOS 10.13.6
我不得不改变这一行
action = prob_policy.sample()
到这一行
action = tf.stop_gradient(prob_policy.sample())
如果有人能解释为什么学习协方差的权重使 loc 的权重在损失时可微分,但使协方差成为常数却不能,以及这条线的变化如何促成这一点,我会喜欢解释!谢谢!
我们在 MVNDiag(和 TransformedDistribution 的其他子类)内部为实现可逆性而进行的一些缓存导致了一个问题。
如果您在 .sample() 之后执行 + 0
(作为解决方法),渐变将起作用。
我还建议使用 dist.log_prob(..)
而不是 tf.log(dist.prob(..))
。更好的数字。
import numpy as np
import tensorflow as tf
import tensorflow_probability as tfp
# make the graph
state = tf.placeholder(tf.float32, (1, 2), name="state")
mu = tf.contrib.layers.fully_connected(
inputs=state,
num_outputs=2,
biases_initializer=tf.ones_initializer)
sigma = tf.contrib.layers.fully_connected(
inputs=state,
num_outputs=2,
biases_initializer=tf.ones_initializer)
sigma = tf.squeeze(sigma)
mu = tf.squeeze(mu)
prob_policy = tfp.distributions.MultivariateNormalDiag(loc=mu, scale_diag=[1.,1.])
action = prob_policy.sample() + 0
loss = -prob_policy.log_prob(action)
optimizer = tf.train.AdamOptimizer(learning_rate=0.01)
train_op = optimizer.minimize(loss)
# run the optimizer
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
state_input = np.expand_dims([0.,0.],0)
_, action_loss = sess.run([train_op, loss], { state: state_input })
print(action_loss)