在 Keras 中跨两个平行层的反向传播
Backpropagation across two parallel layers in Keras
我想创建一个具有两个并行层的网络(将相同的输入提供给两个不同的层,并将它们的输出与一些数学运算相结合)。话虽如此,我不确定反向传播是否会由 Keras 自动完成。作为自定义 RNN
单元格的简单示例,
class Example(keras.layers.Layer):
def __init__(self, units, **kwargs):
super(Example, self).__init__(**kwargs)
self.units = units
self.state_size = units
self.la = keras.layers.Dense(self.units)
self.lb = keras.layers.Dense(self.units)
def call(self, inputs, states):
prev_output = states[0]
# parallel layers
a = tf.sigmoid(self.la(inputs))
b = tf.sigmoid(self.lb(inputs))
# combined using mathematical operation
output = (-1 * prev_output * a) + (prev_output * b)
return output, [output]
Now, the loss gradient to `la` and `lb` layers are different (gradient of loss wrt `a`, should be `-output` but wrt `b` should be `output`), will this be taken care by Keras automatically or should we create custom gradient functions?
Any insights and suggestions are much appreciated :)
检查
的答案
只要所有计算都由张量对象链接,Keras 就会处理反向传播,即不要将张量转换为数组等其他类型,所以不用担心。
以渐变胶带为例,您可以通过以下方式查看每一层的渐变:
gradients = grad_tape.gradient(total_loss, model.trainable_variables)
gradient_of_last_layer = tf.reduce_max(gradients[-1])
我想创建一个具有两个并行层的网络(将相同的输入提供给两个不同的层,并将它们的输出与一些数学运算相结合)。话虽如此,我不确定反向传播是否会由 Keras 自动完成。作为自定义 RNN
单元格的简单示例,
class Example(keras.layers.Layer):
def __init__(self, units, **kwargs):
super(Example, self).__init__(**kwargs)
self.units = units
self.state_size = units
self.la = keras.layers.Dense(self.units)
self.lb = keras.layers.Dense(self.units)
def call(self, inputs, states):
prev_output = states[0]
# parallel layers
a = tf.sigmoid(self.la(inputs))
b = tf.sigmoid(self.lb(inputs))
# combined using mathematical operation
output = (-1 * prev_output * a) + (prev_output * b)
return output, [output]
Now, the loss gradient to `la` and `lb` layers are different (gradient of loss wrt `a`, should be `-output` but wrt `b` should be `output`), will this be taken care by Keras automatically or should we create custom gradient functions?
Any insights and suggestions are much appreciated :)
检查
只要所有计算都由张量对象链接,Keras 就会处理反向传播,即不要将张量转换为数组等其他类型,所以不用担心。
以渐变胶带为例,您可以通过以下方式查看每一层的渐变:
gradients = grad_tape.gradient(total_loss, model.trainable_variables)
gradient_of_last_layer = tf.reduce_max(gradients[-1])