如何从 Mountain Car 的自定义初始状态启动环境?
How can I start the environment from a custom initial state for Mountain Car?
我想从一个自定义的初始点开始OpenAI Gym的连续Mountain Car环境。 OpenAI Gym 不提供任何方法来做到这一点。我查看了环境的 code,发现有一个属性 state
保存状态信息。我试图手动更改该属性。但是,它不起作用。
您可以看到附加的代码,从状态函数返回的观察值与 env.state
变量不匹配。
我认为这是一些基本的 Python 问题,不允许我访问该属性。有没有办法访问该属性或以其他方式从自定义初始状态开始?我知道我可以创建一个自定义环境(like this) from the existing code and add the functionality too. I found one issue at Github repo,我想他们也建议这样做。
import gym
env = gym.make("MountainCarContinuous-v0")
env.reset()
print(env.state)
env.state = np.array([-0.4, 0])
print(env.state)
for i in range(50):
obs, _, _, _ = env.step([1]) # Just taking right in every step
print(obs, env.state) #the observation and env.state is different
env.render()
代码的输出:
[-0.52196493 0. ]
[-0.4 0. ]
[-0.52047719 0.00148775] [-0.4 0. ]
[-0.51751285 0.00296433] [-0.4 0. ]
[-0.51309416 0.00441869] [-0.4 0. ]
[-0.50725424 0.00583992] [-0.4 0. ]
...
我的一位同事发现了这个错误。我需要使用 env.env.state
而不是 env.state
。附上修改后的代码:
import gym
import numpy as np
env = gym.make("MountainCarContinuous-v0")
env.reset()
env.env.state = np.array([-0.4, 0])
print(env.env.state)
for i in range(50):
obs, _, _, _ = env.step([1])
print(obs, env.env.state)
env.render()
输出:
[-0.4 0. ]
[-0.39940589 0.00059411] [-0.39940589 0.00059411]
[-0.39822183 0.00118406] [-0.39822183 0.00118406]
[-0.39645609 0.00176575] [-0.39645609 0.00176575]
[-0.39412095 0.00233513] [-0.39412095 0.00233513]
[-0.39123267 0.00288829] [-0.39123267 0.00288829]
[-0.38781124 0.00342142] [-0.38781124 0.00342142]
您必须先打开环境才能访问环境的所有属性。
import gym
import numpy as np
env = gym.make("MountainCarContinuous-v0")
env = env.unwrapped # to access the inner functionalities of the class
env.state = np.array([-0.4, 0])
print(env.state)
for i in range(50):
obs, _, _, _ = env.step([1]) # Just taking right in every step
print(obs, env.state) #the observation and env.state are same
env.render()
输出:
[-0.4 0. ]
[-0.39940589 0.00059411] [-0.39940589 0.00059411]
[-0.39822183 0.00118406] [-0.39822183 0.00118406]
[-0.39645609 0.00176575] [-0.39645609 0.00176575]
[-0.39412095 0.00233513] [-0.39412095 0.00233513]
[-0.39123267 0.00288829] [-0.39123267 0.00288829]
[-0.38781124 0.00342142] [-0.38781124 0.00342142]
...
我想从一个自定义的初始点开始OpenAI Gym的连续Mountain Car环境。 OpenAI Gym 不提供任何方法来做到这一点。我查看了环境的 code,发现有一个属性 state
保存状态信息。我试图手动更改该属性。但是,它不起作用。
您可以看到附加的代码,从状态函数返回的观察值与 env.state
变量不匹配。
我认为这是一些基本的 Python 问题,不允许我访问该属性。有没有办法访问该属性或以其他方式从自定义初始状态开始?我知道我可以创建一个自定义环境(like this) from the existing code and add the functionality too. I found one issue at Github repo,我想他们也建议这样做。
import gym
env = gym.make("MountainCarContinuous-v0")
env.reset()
print(env.state)
env.state = np.array([-0.4, 0])
print(env.state)
for i in range(50):
obs, _, _, _ = env.step([1]) # Just taking right in every step
print(obs, env.state) #the observation and env.state is different
env.render()
代码的输出:
[-0.52196493 0. ]
[-0.4 0. ]
[-0.52047719 0.00148775] [-0.4 0. ]
[-0.51751285 0.00296433] [-0.4 0. ]
[-0.51309416 0.00441869] [-0.4 0. ]
[-0.50725424 0.00583992] [-0.4 0. ]
...
我的一位同事发现了这个错误。我需要使用 env.env.state
而不是 env.state
。附上修改后的代码:
import gym
import numpy as np
env = gym.make("MountainCarContinuous-v0")
env.reset()
env.env.state = np.array([-0.4, 0])
print(env.env.state)
for i in range(50):
obs, _, _, _ = env.step([1])
print(obs, env.env.state)
env.render()
输出:
[-0.4 0. ]
[-0.39940589 0.00059411] [-0.39940589 0.00059411]
[-0.39822183 0.00118406] [-0.39822183 0.00118406]
[-0.39645609 0.00176575] [-0.39645609 0.00176575]
[-0.39412095 0.00233513] [-0.39412095 0.00233513]
[-0.39123267 0.00288829] [-0.39123267 0.00288829]
[-0.38781124 0.00342142] [-0.38781124 0.00342142]
您必须先打开环境才能访问环境的所有属性。
import gym
import numpy as np
env = gym.make("MountainCarContinuous-v0")
env = env.unwrapped # to access the inner functionalities of the class
env.state = np.array([-0.4, 0])
print(env.state)
for i in range(50):
obs, _, _, _ = env.step([1]) # Just taking right in every step
print(obs, env.state) #the observation and env.state are same
env.render()
输出:
[-0.4 0. ]
[-0.39940589 0.00059411] [-0.39940589 0.00059411]
[-0.39822183 0.00118406] [-0.39822183 0.00118406]
[-0.39645609 0.00176575] [-0.39645609 0.00176575]
[-0.39412095 0.00233513] [-0.39412095 0.00233513]
[-0.39123267 0.00288829] [-0.39123267 0.00288829]
[-0.38781124 0.00342142] [-0.38781124 0.00342142]
...