Tensorflow apply_gradients() 有多重损失
Tensorflow apply_gradients() with multiple losses
我正在训练一个具有中间输出的模型 (VAEGAN),我有两次损失,
- 我从输出层计算的 KL 散度损失
- 我从中间层计算的相似性 (rec) 损失。
我可以简单地总结它们并应用如下所示的渐变吗?
with tf.GradientTape() as tape:
z_mean, z_log_sigma, z_encoder_output = self.encoder(real_images, training = True)
kl_loss = self.kl_loss_fn(z_mean, z_log_sigma) * kl_loss_coeff
fake_images = self.decoder(z_encoder_output)
fake_inter_activations, logits_fake = self.discriminator(fake_images, training = True)
real_inter_activations, logits_real = self.discriminator(real_images, training = True)
rec_loss = self.rec_loss_fn(fake_inter_activations, real_inter_activations) * rec_loss_coeff
total_encoder_loss = kl_loss + rec_loss
grads = tape.gradient(total_encoder_loss, self.encoder.trainable_weights)
self.e_optimizer.apply_gradients(zip(grads, self.encoder.trainable_weights))
或者我是否需要像下面那样将它们分开,同时保持磁带的持久性?
with tf.GradientTape(persistent = True) as tape:
z_mean, z_log_sigma, z_encoder_output = self.encoder(real_images, training = True)
kl_loss = self.kl_loss_fn(z_mean, z_log_sigma) * kl_loss_coeff
fake_images = self.decoder(z_encoder_output)
fake_inter_activations, logits_fake = self.discriminator(fake_images, training = True)
real_inter_activations, logits_real = self.discriminator(real_images, training = True)
rec_loss = self.rec_loss_fn(fake_inter_activations, real_inter_activations) * rec_loss_coeff
grads_kl_loss = tape.gradient(kl_loss, self.encoder.trainable_weights)
self.e_optimizer.apply_gradients(zip(grads_kl_loss, self.encoder.trainable_weights))
grads_rec_loss = tape.gradient(rec_loss, self.encoder.trainable_weights)
self.e_optimizer.apply_gradients(zip(grads_rec_loss, self.encoder.trainable_weights))
是的,您通常可以对损失求和并计算单个梯度。由于a sum的梯度是各自梯度的和,所以summed loss走的步和一步一步走两个步是一样的。
这是一个简单的例子:假设您有两个权重,并且您当前位于 (1, 3) 点(“起点”)。 loss 1的梯度为(2, -4),loss 2的梯度为(1, 2).
- 如果一个接一个地应用这些步骤,您将首先移动到 (3, -1),然后移动到 (4, 1)。
- 如果先对梯度求和,总步长为(3, -2)。从起点沿着这个方向也可以到达 (4, 1)。
我正在训练一个具有中间输出的模型 (VAEGAN),我有两次损失,
- 我从输出层计算的 KL 散度损失
- 我从中间层计算的相似性 (rec) 损失。
我可以简单地总结它们并应用如下所示的渐变吗?
with tf.GradientTape() as tape:
z_mean, z_log_sigma, z_encoder_output = self.encoder(real_images, training = True)
kl_loss = self.kl_loss_fn(z_mean, z_log_sigma) * kl_loss_coeff
fake_images = self.decoder(z_encoder_output)
fake_inter_activations, logits_fake = self.discriminator(fake_images, training = True)
real_inter_activations, logits_real = self.discriminator(real_images, training = True)
rec_loss = self.rec_loss_fn(fake_inter_activations, real_inter_activations) * rec_loss_coeff
total_encoder_loss = kl_loss + rec_loss
grads = tape.gradient(total_encoder_loss, self.encoder.trainable_weights)
self.e_optimizer.apply_gradients(zip(grads, self.encoder.trainable_weights))
或者我是否需要像下面那样将它们分开,同时保持磁带的持久性?
with tf.GradientTape(persistent = True) as tape:
z_mean, z_log_sigma, z_encoder_output = self.encoder(real_images, training = True)
kl_loss = self.kl_loss_fn(z_mean, z_log_sigma) * kl_loss_coeff
fake_images = self.decoder(z_encoder_output)
fake_inter_activations, logits_fake = self.discriminator(fake_images, training = True)
real_inter_activations, logits_real = self.discriminator(real_images, training = True)
rec_loss = self.rec_loss_fn(fake_inter_activations, real_inter_activations) * rec_loss_coeff
grads_kl_loss = tape.gradient(kl_loss, self.encoder.trainable_weights)
self.e_optimizer.apply_gradients(zip(grads_kl_loss, self.encoder.trainable_weights))
grads_rec_loss = tape.gradient(rec_loss, self.encoder.trainable_weights)
self.e_optimizer.apply_gradients(zip(grads_rec_loss, self.encoder.trainable_weights))
是的,您通常可以对损失求和并计算单个梯度。由于a sum的梯度是各自梯度的和,所以summed loss走的步和一步一步走两个步是一样的。
这是一个简单的例子:假设您有两个权重,并且您当前位于 (1, 3) 点(“起点”)。 loss 1的梯度为(2, -4),loss 2的梯度为(1, 2).
- 如果一个接一个地应用这些步骤,您将首先移动到 (3, -1),然后移动到 (4, 1)。
- 如果先对梯度求和,总步长为(3, -2)。从起点沿着这个方向也可以到达 (4, 1)。