CNN 的输出不会随输入发生太大变化

Question

我一直在尝试用卷积神经网络实现 Actor Critic。有两个不同的图像作为强化学习代理的状态。在（随机）初始化之后，CNN 的输出（Actor 的动作）对于不同的输入是（几乎）相同的。因此，即使经过培训，代理也永远学不到任何有用的东西。

状态定义（2 个输入）：

输入1：[1,1,90,128]图像，像素最大值为45。

输入2：[1,1,45,80]图像，像素最大值为45。

演员的预期输出：[x,y]：根据状态的二维向量。这里 x 预计在 [0,160] 范围内，y 预计在 [0,112]

范围内

对输入尝试了不同类型的修改：

1：按原样送入两张图片。

2：将两个图像都归一化为 (img/45)，以便像素值来自 [0,1]

3: 将两个图像都归一化为 2*((img/45)-0.5)，这样像素值来自 [-1,1]

4：将两张图像都归一化为 (img-mean)/std

结果：CNN 的输出几乎保持不变。

actor的定义代码如下。

import numpy as np
import pandas as pd
from tqdm import tqdm
import time
import cv2
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
from torch.autograd import Variable
import torch.nn.functional as F


device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

class Actor(nn.Module):
    def __init__(self, action_dim, max_action):
        super(Actor,self).__init__()
        # state image [1,1,90,128]
        self.conv11 = nn.Conv2d(1,16,5)
        self.conv11_bn = nn.BatchNorm2d(16)
        self.conv12 = nn.Conv2d(16,16,5)
        self.conv12_bn = nn.BatchNorm2d(16)
        self.fc11 = nn.Linear(19*29*16,500)
        # dim image [1,1,45,80]
        self.conv21 = nn.Conv2d(1,16,5) 
        self.conv21_bn = nn.BatchNorm2d(16)
        self.conv22 = nn.Conv2d(16,16,5)
        self.conv2_bn = nn.BatchNorm2d(16)
        self.fc21 = nn.Linear(8*17*16,250)
        # common pool
        self.pool  = nn.MaxPool2d(2,2)
        # after concatenation
        self.fc2 = nn.Linear(750,100)
        self.fc3 = nn.Linear(100,10)
        self.fc4 = nn.Linear(10,action_dim)
        self.max_action = max_action

    def forward(self,x,y):
        # state image
        x = self.conv11_bn(self.pool(F.relu(self.conv11(x))))
        x = self.conv11_bn(self.pool(F.relu(self.conv12(x))))
        x = x.view(-1,19*29*16)
        x = F.relu(self.fc11(x))
        # state dim
        y = self.conv11_bn(self.pool(F.relu(self.conv21(y))))
        y = self.conv11_bn(self.pool(F.relu(self.conv22(y))))
        y = y.view(-1,8*17*16)
        y = F.relu(self.fc21(y))
        # concatenate
        z = torch.cat((x,y),dim=1)
        z = F.relu(self.fc2(z))
        z = F.relu(self.fc3(z))
        z = self.max_action*torch.tanh(self.fc4(z))
        return z

# to read different sample states for testing
obs = []
for i in range(200):
    obs.append(np.load('eval_episodes/obs_'+str(i)+'.npy',allow_pickle=True))

obs = np.array(obs)

def tensor_from_numpy(state):
    # to add dimensions to tensor to make it [batch_size,channels,height,width] 
    state_img = state
    state_img = torch.from_numpy(state_img).float()
    state_img = state_img[np.newaxis, :]
    state_img = state_img[np.newaxis, :].to(device)
    return state_img


actor = Actor(2,torch.FloatTensor([160,112]))
for i in range(20):
    a = tensor_from_numpy(obs[i][0])
    b = tensor_from_numpy(obs[i][2])    
    print(actor(a,b))

以上代码的输出：

tensor([[28.8616,  3.0934]], grad_fn=<MulBackward0>)
tensor([[27.4125,  3.2864]], grad_fn=<MulBackward0>)
tensor([[28.2210,  2.6859]], grad_fn=<MulBackward0>)
tensor([[27.6312,  3.9528]], grad_fn=<MulBackward0>)
tensor([[25.9290,  4.2942]], grad_fn=<MulBackward0>)
tensor([[26.9652,  4.5730]], grad_fn=<MulBackward0>)
tensor([[27.1342,  2.9612]], grad_fn=<MulBackward0>)
tensor([[27.6494,  4.2218]], grad_fn=<MulBackward0>)
tensor([[27.3122,  1.9945]], grad_fn=<MulBackward0>)
tensor([[29.6915,  1.9938]], grad_fn=<MulBackward0>)
tensor([[28.2001,  2.5967]], grad_fn=<MulBackward0>)
tensor([[26.8502,  4.4917]], grad_fn=<MulBackward0>)
tensor([[28.6489,  3.2022]], grad_fn=<MulBackward0>)
tensor([[28.1455,  2.7610]], grad_fn=<MulBackward0>)
tensor([[27.2369,  3.4243]], grad_fn=<MulBackward0>)
tensor([[25.9513,  5.3057]], grad_fn=<MulBackward0>)
tensor([[28.1400,  3.3242]], grad_fn=<MulBackward0>)
tensor([[28.2049,  2.6622]], grad_fn=<MulBackward0>)
tensor([[26.7446,  2.5966]], grad_fn=<MulBackward0>)
tensor([[25.3867,  5.0346]], grad_fn=<MulBackward0>)

可以找到 states(.npy) 个文件 here 对于不同的状态，动作应该在 [0-160,0-112] 之间变化，但这里的输出只是略有不同。

注意：输入图像最初是稀疏的（图像中有很多零）

状态像素值或网络定义有问题吗？

编辑：我认为问题必须与输入的归一化或稀疏性有关，因为我也尝试过与 tensorflow 相同的网络并且在那里面临同样的问题。

Answer 1

问题是权重初始化不合适。我使用高斯初始化，标准差是默认值的两倍。这有助于为不同的输入提供不同的输出。尽管经过几集训练后，演员又开始给出相同的值，这是由于评论网络变得饱和。

CNN 的输出不会随输入发生太大变化

Output of a CNN doesn't change much with the input

python

reinforcement-learning

deep-learning

conv-neural-network

pytorch