如何在 Python 的 ChainerRL 中扩展代理 class
How to extend an agent class in ChainerRL in Python
我想在 ChainerRL 中扩展 PPO 代理 class。我做了以下事情:
class exPPO(chainerrl.agents.PPO):
def act_and_train(self, obs, reward):
action = chainerrl.agents.PPO(self, obs, reward)
print("this is my exPPO act and train")
return action
我尝试使用 cartpole env of gym,但是做的时候
obs, reward = env.step(action)
它只是崩溃,输出如下
this is my exPPO act and train
Traceback (most recent call last):
File "C:\personal_if\extendPPO.py", line 184, in <module>
obs, reward, done, _ = env.step(action)
File "C:\Users\PareekHi\AppData\Local\Programs\Python\Python37\lib\site-packages\gym\wrappers\time_limit.py", line 16, in step
observation, reward, done, info = self.env.step(action)
File "C:\Users\PareekHi\AppData\Local\Programs\Python\Python37\lib\site-packages\gym\envs\classic_control\cartpole.py", line 104, in step
assert self.action_space.contains(action), err_msg
AssertionError: <chainerrl.agents.ppo.PPO object at 0x000002A986D2F508> (<class 'chainerrl.agents.ppo.PPO'>) invalid
请帮助我如何在此处扩展 PPO class。
action = super().act_and_train(obs, reward)
我想在 ChainerRL 中扩展 PPO 代理 class。我做了以下事情:
class exPPO(chainerrl.agents.PPO):
def act_and_train(self, obs, reward):
action = chainerrl.agents.PPO(self, obs, reward)
print("this is my exPPO act and train")
return action
我尝试使用 cartpole env of gym,但是做的时候
obs, reward = env.step(action)
它只是崩溃,输出如下
this is my exPPO act and train
Traceback (most recent call last):
File "C:\personal_if\extendPPO.py", line 184, in <module>
obs, reward, done, _ = env.step(action)
File "C:\Users\PareekHi\AppData\Local\Programs\Python\Python37\lib\site-packages\gym\wrappers\time_limit.py", line 16, in step
observation, reward, done, info = self.env.step(action)
File "C:\Users\PareekHi\AppData\Local\Programs\Python\Python37\lib\site-packages\gym\envs\classic_control\cartpole.py", line 104, in step
assert self.action_space.contains(action), err_msg
AssertionError: <chainerrl.agents.ppo.PPO object at 0x000002A986D2F508> (<class 'chainerrl.agents.ppo.PPO'>) invalid
请帮助我如何在此处扩展 PPO class。
action = super().act_and_train(obs, reward)