space-invaders RL (Keras) 的 LSTM 网络

Question

我是强化学习的新手，正在尝试使用 LSTM 为 space 入侵者代理进行强化学习。我尝试使用在此 paper 中找到的网络，但我一直遇到问题：

-如果我使用 conv2D，LSTM 的尺寸不适合，我会收到此错误：

ValueError: Input 0 is incompatible with layer conv_lst_m2d_1: expected ndim=5, found ndim=4

这是代码：

    self.model = Sequential()
    self.model.add(Conv2D(32,kernel_size=8,strides=4,activation='relu',input_shape=(None,84,84,1)))
    self.model.add(Conv2D(64,kernel_size=4,strides=2,activation='relu'))
    self.model.add(Conv2D(64,kernel_size=3, strides=1,activation='relu'))
    self.model.add(ConvLSTM2D(512, kernel_size=(3,3), padding='same', return_sequences=False))
    self.model.add(Dense(4, activation='relu'))
    self.model.compile(loss='mse', optimizer=Adam(lr=0.0001))
    self.model.summary()

-如果我使用输出 5D 张量的 Conv3D，我不能使用一张图像作为输入：

ValueError: Error when checking input: expected conv3d_1_input to have 5 dimensions, but got array with shape (1, 84, 84, 1)

代码：

    self.model.add(Conv3D(32,kernel_size=8,strides=4,activation='relu',input_shape=(None,84,84,1)))
    self.model.add(Conv3D(64,kernel_size=4,strides=2,activation='relu'))
    self.model.add(Conv3D(64,kernel_size=3, strides=1,activation='relu'))
    self.model.add(ConvLSTM2D(512, kernel_size=(3,3), padding='same', return_sequences=False))
    self.model.add(Dense(4, activation='relu'))
    self.model.compile(loss='mse', optimizer=Adam(lr=0.0001))
    self.model.summary()

（编辑）

网络摘要（第二个网络的）：

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv3d_1 (Conv3D)            (None, None, 20, 20, 32)  16416     
_________________________________________________________________
conv3d_2 (Conv3D)            (None, None, 9, 9, 64)    131136    
_________________________________________________________________
conv3d_3 (Conv3D)            (None, None, 7, 7, 64)    110656    
_________________________________________________________________
conv_lst_m2d_1 (ConvLSTM2D)  (None, 7, 7, 512)         10618880  
_________________________________________________________________
dense_1 (Dense)              (None, 7, 7, 4)           2052      
=================================================================

数据输入形状为：(84, 84, 1)

Answer 1

您需要使用 TimeDistributed Conv2D，它告诉您的网络将数据理解为时间性的（我猜这就是您想要的）并且将匹配 LSTM 内部形状。

import tensorflow as tf

model = tf.keras.Sequential()

model.add(tf.keras.layers.Input(shape=(None,84,84,1)))

model.add(tf.keras.layers.TimeDistributed(tf.keras.layers.Conv2D(32,kernel_size=8,strides=4,activation='relu')))

model.add(tf.keras.layers.TimeDistributed(tf.keras.layers.Conv2D(64,kernel_size=4,strides=2,activation='relu')))

model.add(tf.keras.layers.TimeDistributed(tf.keras.layers.Conv2D(64,kernel_size=3, strides=1,activation='relu')))

model.add(tf.keras.layers.ConvLSTM2D(512, kernel_size=(3,3), padding='same', return_sequences=False))

model.add(tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(4, activation='relu')))

model.compile(loss='mse', optimizer=tf.keras.optimizers.Adam(lr=0.0001))

model.summary()

编译 returns :

Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
time_distributed_12 (TimeDis (None, None, 20, 20, 32)  2080      
_________________________________________________________________
time_distributed_13 (TimeDis (None, None, 9, 9, 64)    32832     
_________________________________________________________________
time_distributed_14 (TimeDis (None, None, 7, 7, 64)    36928     
_________________________________________________________________
conv_lst_m2d_3 (ConvLSTM2D)  (None, 7, 7, 512)         10618880  
_________________________________________________________________
time_distributed_15 (TimeDis (None, 7, 7, 4)           2052      
=================================================================
Total params: 10,692,772
Trainable params: 10,692,772
Non-trainable params: 0
_________________________________________________________________

Answer 2

首先尝试打印模型的输入和输出详细信息：-

o/p会变成这样-

输入标记模型：

[{'name': 'input', 'index': 451, 'shape': array([  1, 160, 160,   3],
dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0,
0)}]

标记模型输出：

[{'name': 'embeddings', 'index': 450, 'shape': array([  1, 512], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0)}]

一旦你得到细节，根据细节你将不得不给出input_shape的值。

space-invaders RL (Keras) 的 LSTM 网络

LSTM network for space-invaders RL (Keras)

python

machine-learning

reinforcement-learning

lstm

keras