为什么 batch_size 在 Tensorflow 中乘以 GradientTape 结果?
Why is batch_size being multiplied to GradientTape results in Tensorflow?
我正在尝试将损失函数 w.r.t 的梯度获取到另一个张量。但是梯度乘以我输入模型的输入批量大小。
import tensorflow as tf
from tensorflow.keras import Sequential, layers
#Sample States and Returns
states = tf.random.uniform(shape = (100,4))
returns = tf.constant([float(i) for i in range(100)])
#Creating dataset to feed data to model
states = tf.data.Dataset.from_tensor_slices(states)
returns = tf.data.Dataset.from_tensor_slices(returns)
#zipping datasets into one
batch_size = 4
dataset = tf.data.Dataset.zip((states, returns)).batch(batch_size)
model = Sequential([layers.Dense(128, input_shape =(4,), activation = tf.nn.relu),
layers.Dense(1, activation = tf.nn.tanh)])
for state_batch, returns_batch in dataset:
with tf.GradientTape(persistent=True) as tape:
values = model(state_batch)
loss = returns_batch - values
# d_loss/d_values should be -1.0, but i'm getting -1.0 * batch_size
print(tape.gradient(loss,values))
break
Output:
tf.Tensor(
[[-4.]
[-4.]
[-4.]
[-4.]], shape=(4, 1), dtype=float32)
Expected Output:
tf.Tensor(
[[-1.]
[-1.]
[-1.]
[-1.]], shape=(4, 1), dtype=float32)
从代码中可以看出loss = returns - values
。所以它应该是 d_loss/d_values = -1.0
,但我得到的结果是 d_loss/d_values = -1.0 * batch_size
。有人请指出为什么会这样吗?我怎样才能得到真正的结果?
colab link : https://colab.research.google.com/drive/1x4pyGJ5ccRVSMzDAeLzcPXRtO7cNFnJf?usp=sharing
问题出在这一行:
loss = returns_batch - values
此处,returns_batch
的形状为 (4,)
,但 values
的形状为 (4, 1)
。减法运算会广播张量,生成 loss
张量,其形状为 (4, 4)
,具有四个重复的列。因此,更改 values
的单个值会影响 returns_batch
的四个元素,因此会影响缩放的梯度值。您可以像这样修复它:
loss = returns_batch - tf.squeeze(values, axis=1)
我正在尝试将损失函数 w.r.t 的梯度获取到另一个张量。但是梯度乘以我输入模型的输入批量大小。
import tensorflow as tf
from tensorflow.keras import Sequential, layers
#Sample States and Returns
states = tf.random.uniform(shape = (100,4))
returns = tf.constant([float(i) for i in range(100)])
#Creating dataset to feed data to model
states = tf.data.Dataset.from_tensor_slices(states)
returns = tf.data.Dataset.from_tensor_slices(returns)
#zipping datasets into one
batch_size = 4
dataset = tf.data.Dataset.zip((states, returns)).batch(batch_size)
model = Sequential([layers.Dense(128, input_shape =(4,), activation = tf.nn.relu),
layers.Dense(1, activation = tf.nn.tanh)])
for state_batch, returns_batch in dataset:
with tf.GradientTape(persistent=True) as tape:
values = model(state_batch)
loss = returns_batch - values
# d_loss/d_values should be -1.0, but i'm getting -1.0 * batch_size
print(tape.gradient(loss,values))
break
Output:
tf.Tensor(
[[-4.]
[-4.]
[-4.]
[-4.]], shape=(4, 1), dtype=float32)
Expected Output:
tf.Tensor(
[[-1.]
[-1.]
[-1.]
[-1.]], shape=(4, 1), dtype=float32)
从代码中可以看出loss = returns - values
。所以它应该是 d_loss/d_values = -1.0
,但我得到的结果是 d_loss/d_values = -1.0 * batch_size
。有人请指出为什么会这样吗?我怎样才能得到真正的结果?
colab link : https://colab.research.google.com/drive/1x4pyGJ5ccRVSMzDAeLzcPXRtO7cNFnJf?usp=sharing
问题出在这一行:
loss = returns_batch - values
此处,returns_batch
的形状为 (4,)
,但 values
的形状为 (4, 1)
。减法运算会广播张量,生成 loss
张量,其形状为 (4, 4)
,具有四个重复的列。因此,更改 values
的单个值会影响 returns_batch
的四个元素,因此会影响缩放的梯度值。您可以像这样修复它:
loss = returns_batch - tf.squeeze(values, axis=1)