当使用从生成器到鉴别器的对称输出时,GAN 不会学习
GAN does not learn when using symmetric outputs from generator to disciminator
我目前正在尝试实现与该论文具有相似模型结构的论文 Generative modeling for protein structures and I have succesfully been able to train a model following Pytorch's DCGAN Tutorial。这两种实现在生成器的输出方面有所不同。
在教程的模型中,生成器只是将一个正常的输出矩阵传递给鉴别器。当我实现论文的模型(省略对称性和钳位)时这很好用,但论文指定:
During training, we enforce that G(z) be
positive by clamping output values above zero and symmetric
当我将其放入我的训练循环中时,我收到了一个损失图表,表明生成器没有在学习。
这是我的训练循环:
# Training Loop
# Lists to keep track of progress
img_list = []
G_losses = []
D_losses = []
iters = 0
print("Starting Training Loop...")
# For each epoch
for epoch in range(num_epochs):
# For each batch in the dataloader
for i, data in enumerate(dataloader, 0):
############################
# (1) Update D network: maximize log(D(x)) + log(1 - D(G(z)))
###########################
## Train with all-real batch
netD.zero_grad()
# Format batch
# Unsqueezed dim one to convert [128, 64, 64] to [128, 1, 64, 64] to conform to D architecture
real_cpu = (data.unsqueeze(dim=1).type(torch.FloatTensor)).to(device)
b_size = real_cpu.size(0)
label = torch.full((b_size,), real_label, device=device)
# Forward pass real batch through D
output = netD(real_cpu).view(-1)
# Calculate loss on all-real batch
errD_real = criterion(output, label)
# Calculate gradients for D in backward pass
errD_real.backward()
D_x = output.mean().item()
## Train with all-fake batch
# Generate batch of latent vectors
noise = torch.randn(b_size, nz, 1, 1, device=device)
# Generate fake image batch with G
fake = netG(noise)
label.fill_(fake_label)
# Make Symmetric
sym_fake = (fake.detach().clamp(min=0) + fake.detach().clamp(min=0).permute(0, 1, 3, 2)) / 2
# Classify all fake batch with D
output = netD(sym_fake).view(-1)
# Calculate D's loss on the all-fake batch
errD_fake = criterion(output, label)
# Calculate the gradients for this batch
errD_fake.backward()
D_G_z1 = output.mean().item()
# Add the gradients from the all-real and all-fake batches
errD = errD_real + errD_fake
# Update D
optimizerD.step()
#adjust_optim(optimizerD, iters)
############################
# (2) Update G network: maximize log(D(G(z)))
###########################
netG.zero_grad()
label.fill_(real_label) # fake labels are real for generator cost
# Since we just updated D, perform another forward pass of all-fake batch through D
output = netD(fake.detach()).view(-1)
# Calculate G's loss based on this output
errG = criterion(output, label)
# Calculate gradients for G
errG.backward()
D_G_z2 = output.mean().item()
# Update G
optimizerG.step()
adjust_optim(optimizerG, iters)
# Output training stats
if i % 50 == 0:
print('[%d/%d][%d/%d]\tLoss_D: %.4f\tLoss_G: %.4f\tD(x): %.4f\tD(G(z)): %.4f / %.4f'
% (epoch, num_epochs, i, len(dataloader),
errD.item(), errG.item(), D_x, D_G_z1, D_G_z2))
# Save Losses for plotting later
G_losses.append(errG.item())
D_losses.append(errD.item())
# Check how the generator is doing by saving G's output on fixed_noise
if (iters % 500 == 0) or ((epoch == num_epochs-1) and (i == len(dataloader)-1)):
with torch.no_grad():
fake = netG(fixed_noise).detach().cpu()
img_list.append(vutils.make_grid(fake, padding=2, normalize=True))
iters += 1
这是训练损失。
这是我的预期损失。
我使输出与以下行对称
sym_fake = (fake.detach().clamp(min=0) + fake.detach().clamp(min=0).permute(0, 1, 3, 2)) / 2
然后我将它传递给调用 sym_fake
行上的鉴别器
问题
我在 pytorch 中的实现是错误的还是我遗漏了什么?我不明白如果网络能够在不需要对称和钳位的情况下生成图像,为什么论文使矩阵对称和钳位。
可能是因为 netG
的 criterion
获得了与 netG
的参数分离的 output
因此优化器不是/不能正在更新 netG
.
的参数
我目前正在尝试实现与该论文具有相似模型结构的论文 Generative modeling for protein structures and I have succesfully been able to train a model following Pytorch's DCGAN Tutorial。这两种实现在生成器的输出方面有所不同。
在教程的模型中,生成器只是将一个正常的输出矩阵传递给鉴别器。当我实现论文的模型(省略对称性和钳位)时这很好用,但论文指定:
During training, we enforce that G(z) be positive by clamping output values above zero and symmetric
当我将其放入我的训练循环中时,我收到了一个损失图表,表明生成器没有在学习。
这是我的训练循环:
# Training Loop
# Lists to keep track of progress
img_list = []
G_losses = []
D_losses = []
iters = 0
print("Starting Training Loop...")
# For each epoch
for epoch in range(num_epochs):
# For each batch in the dataloader
for i, data in enumerate(dataloader, 0):
############################
# (1) Update D network: maximize log(D(x)) + log(1 - D(G(z)))
###########################
## Train with all-real batch
netD.zero_grad()
# Format batch
# Unsqueezed dim one to convert [128, 64, 64] to [128, 1, 64, 64] to conform to D architecture
real_cpu = (data.unsqueeze(dim=1).type(torch.FloatTensor)).to(device)
b_size = real_cpu.size(0)
label = torch.full((b_size,), real_label, device=device)
# Forward pass real batch through D
output = netD(real_cpu).view(-1)
# Calculate loss on all-real batch
errD_real = criterion(output, label)
# Calculate gradients for D in backward pass
errD_real.backward()
D_x = output.mean().item()
## Train with all-fake batch
# Generate batch of latent vectors
noise = torch.randn(b_size, nz, 1, 1, device=device)
# Generate fake image batch with G
fake = netG(noise)
label.fill_(fake_label)
# Make Symmetric
sym_fake = (fake.detach().clamp(min=0) + fake.detach().clamp(min=0).permute(0, 1, 3, 2)) / 2
# Classify all fake batch with D
output = netD(sym_fake).view(-1)
# Calculate D's loss on the all-fake batch
errD_fake = criterion(output, label)
# Calculate the gradients for this batch
errD_fake.backward()
D_G_z1 = output.mean().item()
# Add the gradients from the all-real and all-fake batches
errD = errD_real + errD_fake
# Update D
optimizerD.step()
#adjust_optim(optimizerD, iters)
############################
# (2) Update G network: maximize log(D(G(z)))
###########################
netG.zero_grad()
label.fill_(real_label) # fake labels are real for generator cost
# Since we just updated D, perform another forward pass of all-fake batch through D
output = netD(fake.detach()).view(-1)
# Calculate G's loss based on this output
errG = criterion(output, label)
# Calculate gradients for G
errG.backward()
D_G_z2 = output.mean().item()
# Update G
optimizerG.step()
adjust_optim(optimizerG, iters)
# Output training stats
if i % 50 == 0:
print('[%d/%d][%d/%d]\tLoss_D: %.4f\tLoss_G: %.4f\tD(x): %.4f\tD(G(z)): %.4f / %.4f'
% (epoch, num_epochs, i, len(dataloader),
errD.item(), errG.item(), D_x, D_G_z1, D_G_z2))
# Save Losses for plotting later
G_losses.append(errG.item())
D_losses.append(errD.item())
# Check how the generator is doing by saving G's output on fixed_noise
if (iters % 500 == 0) or ((epoch == num_epochs-1) and (i == len(dataloader)-1)):
with torch.no_grad():
fake = netG(fixed_noise).detach().cpu()
img_list.append(vutils.make_grid(fake, padding=2, normalize=True))
iters += 1
这是训练损失。
这是我的预期损失。
我使输出与以下行对称
sym_fake = (fake.detach().clamp(min=0) + fake.detach().clamp(min=0).permute(0, 1, 3, 2)) / 2
然后我将它传递给调用 sym_fake
问题
我在 pytorch 中的实现是错误的还是我遗漏了什么?我不明白如果网络能够在不需要对称和钳位的情况下生成图像,为什么论文使矩阵对称和钳位。
可能是因为 netG
的 criterion
获得了与 netG
的参数分离的 output
因此优化器不是/不能正在更新 netG
.