FailedPreconditionError while using DDPG RL algorithm, in python, with keras, keras-rl2

Question

我正在使用 openai gym 编写的自定义环境中训练 DDPG 代理。我在训练模型时遇到错误。

我在网上搜索解决方案时，发现有些遇到类似问题的人可以通过初始化变量来解决。

For example by using:
tf.global_variable_initialzer()

但是我用的tensorflow 2.5.0版本没有这个方法。这意味着应该有一些其他方法来解决这个错误。但是我找不到解决方案。

这是我在这些版本中使用的库

tensorflow: 2.5.0
gym:        0.18.3
numpy:      1.19.5
keras:      2.4.3
keras-rl2:  1.0.5          DDPG agent comes from this library

Error/Stacktrace:

Training for 1000 steps ...
Interval 1 (0 steps performed)
   17/10000 [..............................] - ETA: 1:04 - reward: 256251545.0121
C:\Users\vchou\anaconda3\envs\AdSpendProblem\lib\site-packages\keras\engine\training.py:2401: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
  100/10000 [..............................] - ETA: 1:03 - reward: 272267266.5754
C:\Users\vchou\anaconda3\envs\AdSpendProblem\lib\site-packages\tensorflow\python\keras\engine\training.py:2426: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
---------------------------------------------------------------------------
FailedPreconditionError                   Traceback (most recent call last)
<ipython-input-17-0938aa6056e8> in <module>
      1 # Training
----> 2 ddpgAgent.fit(env, 1000, verbose=1, nb_max_episode_steps = 100)

~\anaconda3\envs\AdSpendProblem\lib\site-packages\rl\core.py in fit(self, env, nb_steps, action_repetition, callbacks, verbose, visualize, nb_max_start_steps, start_step_policy, log_interval, nb_max_episode_steps)
    191                     # Force a terminal state.
    192                     done = True
--> 193                 metrics = self.backward(reward, terminal=done)
    194                 episode_reward += reward
    195 

~\anaconda3\envs\AdSpendProblem\lib\site-packages\rl\agents\ddpg.py in backward(self, reward, terminal)
    279                     state0_batch_with_action = [state0_batch]
    280                 state0_batch_with_action.insert(self.critic_action_input_idx, action_batch)
--> 281                 metrics = self.critic.train_on_batch(state0_batch_with_action, targets)
    282                 if self.processor is not None:
    283                     metrics += self.processor.metrics

~\anaconda3\envs\AdSpendProblem\lib\site-packages\keras\engine\training_v1.py in train_on_batch(self, x, y, sample_weight, class_weight, reset_metrics)
   1075       self._update_sample_weight_modes(sample_weights=sample_weights)
   1076       self._make_train_function()
-> 1077       outputs = self.train_function(ins)  # pylint: disable=not-callable
   1078 
   1079     if reset_metrics:

~\anaconda3\envs\AdSpendProblem\lib\site-packages\keras\backend.py in __call__(self, inputs)
   4017       self._make_callable(feed_arrays, feed_symbols, symbol_vals, session)
   4018 
-> 4019     fetched = self._callable_fn(*array_vals,
   4020                                 run_metadata=self.run_metadata)
   4021     self._call_fetch_callbacks(fetched[-len(self._fetches):])

~\anaconda3\envs\AdSpendProblem\lib\site-packages\tensorflow\python\client\session.py in __call__(self, *args, **kwargs)
   1478       try:
   1479         run_metadata_ptr = tf_session.TF_NewBuffer() if run_metadata else None
-> 1480         ret = tf_session.TF_SessionRunCallable(self._session._session,
   1481                                                self._handle, args,
   1482                                                run_metadata_ptr)

FailedPreconditionError: Could not find variable dense_5_1/kernel. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. Debug info: container=localhost, status=Not found: Resource localhost/dense_5_1/kernel/class tensorflow::Var does not exist.
     [[{{node ReadVariableOp_21}}]]

演员和评论家网络如下：

ACTOR NETWORK
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten (Flatten)            (None, 10)                0         
_________________________________________________________________
dense (Dense)                (None, 32)                352       
_________________________________________________________________
activation (Activation)      (None, 32)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 32)                1056      
_________________________________________________________________
activation_1 (Activation)    (None, 32)                0         
_________________________________________________________________
dense_2 (Dense)              (None, 32)                1056      
_________________________________________________________________
activation_2 (Activation)    (None, 32)                0         
_________________________________________________________________
dense_3 (Dense)              (None, 10)                330       
_________________________________________________________________
activation_3 (Activation)    (None, 10)                0         
=================================================================
Total params: 2,794
Trainable params: 2,794
Non-trainable params: 0
_________________________________________________________________
None

CRITIC NETWORK
Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
observation_input (InputLayer)  [(None, 1, 10)]      0                                            
__________________________________________________________________________________________________
action_input (InputLayer)       [(None, 10)]         0                                            
__________________________________________________________________________________________________
flatten_1 (Flatten)             (None, 10)           0           observation_input[0][0]          
__________________________________________________________________________________________________
concatenate (Concatenate)       (None, 20)           0           action_input[0][0]               
                                                                 flatten_1[0][0]                  
__________________________________________________________________________________________________
dense_4 (Dense)                 (None, 32)           672         concatenate[0][0]                
__________________________________________________________________________________________________
activation_4 (Activation)       (None, 32)           0           dense_4[0][0]                    
__________________________________________________________________________________________________
dense_5 (Dense)                 (None, 32)           1056        activation_4[0][0]               
__________________________________________________________________________________________________
activation_5 (Activation)       (None, 32)           0           dense_5[0][0]                    
__________________________________________________________________________________________________
dense_6 (Dense)                 (None, 32)           1056        activation_5[0][0]               
__________________________________________________________________________________________________
activation_6 (Activation)       (None, 32)           0           dense_6[0][0]                    
__________________________________________________________________________________________________
dense_7 (Dense)                 (None, 1)            33          activation_6[0][0]               
__________________________________________________________________________________________________
activation_7 (Activation)       (None, 1)            0           dense_7[0][0]                    
==================================================================================================
Total params: 2,817
Trainable params: 2,817
Non-trainable params: 0
__________________________________________________________________________________________________
None

这里是DDPG代理的代码

# Create DDPG agent
ddpgAgent = DDPGAgent(
    nb_actions = nb_actions,
    actor = actor,
    critic = critic,
    critic_action_input = action_input,
    memory = memory,
    nb_steps_warmup_critic = 100,
    nb_steps_warmup_actor = 100,
    random_process = random_process,
    gamma = 0.99,
    target_model_update = 1e-3
)

ddpgAgent.compile(Adam(learning_rate=0.001, clipnorm=1.0), metrics=['mae'])

Answer 1

现在我可以通过将来自 keras 的导入替换为来自 tensorflow.keras 的导入来解决此错误，尽管我不知道为什么 keras itseld 不起作用

FailedPreconditionError while using DDPG RL algorithm, in python, with keras, keras-rl2

FailedPreconditionError while using DDPG RL algorithm, in python, with keras, keras-rl2

python

reinforcement-learning

keras

keras-rl

tensorflow2.0