OpenAI-Gym 和 Keras-RL:DQN 期望每个动作都有一个维度的模型
OpenAI-Gym and Keras-RL: DQN expects a model that has one dimension for each action
我正在尝试在 OpenAI Gym 中设置具有自定义环境的 Deep-Q-Learning 代理。我有 4 个具有单独限制的连续状态变量和 3 个具有单独限制的整数动作变量。
代码如下:
#%% import
from gym import Env
from gym.spaces import Discrete, Box, Tuple
import numpy as np
#%%
class Custom_Env(Env):
def __init__(self):
# Define the state space
#State variables
self.state_1 = 0
self.state_2 = 0
self.state_3 = 0
self.state_4_currentTimeSlots = 0
#Define the gym components
self.action_space = Box(low=np.array([0, 0, 0]), high=np.array([10, 20, 27]), dtype=np.int)
self.observation_space = Box(low=np.array([20, -20, 0, 0]), high=np.array([22, 250, 100, 287]),dtype=np.float16)
def step(self, action ):
# Update state variables
self.state_1 = self.state_1 + action [0]
self.state_2 = self.state_2 + action [1]
self.state_3 = self.state_3 + action [2]
#Calculate reward
reward = self.state_1 + self.state_2 + self.state_3
#Set placeholder for info
info = {}
#Check if it's the end of the day
if self.state_4_currentTimeSlots >= 287:
done = True
if self.state_4_currentTimeSlots < 287:
done = False
#Move to the next timeslot
self.state_4_currentTimeSlots +=1
state = np.array([self.state_1,self.state_2, self.state_3, self.state_4_currentTimeSlots ])
#Return step information
return state, reward, done, info
def render (self):
pass
def reset (self):
self.state_1 = 0
self.state_2 = 0
self.state_3 = 0
self.state_4_currentTimeSlots = 0
state = np.array([self.state_1,self.state_2, self.state_3, self.state_4_currentTimeSlots ])
return state
#%% Set up the environment
env = Custom_Env()
#%% Create a deep learning model with keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.optimizers import Adam
def build_model(states, actions):
model = Sequential()
model.add(Dense(24, activation='relu', input_shape=states))
model.add(Dense(24, activation='relu'))
model.add(Dense(actions[0] , activation='linear'))
return model
states = env.observation_space.shape
actions = env.action_space.shape
print("env.observation_space: ", env.observation_space)
print("env.observation_space.shape : ", env.observation_space.shape )
print("action_space: ", env.action_space)
print("action_space.shape : ", env.action_space.shape )
model = build_model(states, actions)
print(model.summary())
#%% Build Agent wit Keras-RL
from rl.agents import DQNAgent
from rl.policy import BoltzmannQPolicy
from rl.memory import SequentialMemory
def build_agent (model, actions):
policy = BoltzmannQPolicy()
memory = SequentialMemory(limit = 50000, window_length=1)
dqn = DQNAgent (model = model, memory = memory, policy=policy,
nb_actions=actions, nb_steps_warmup=10, target_model_update= 1e-2)
return dqn
dqn = build_agent(model, actions)
dqn.compile(Adam(lr=1e-3), metrics = ['mae'])
dqn.fit (env, nb_steps = 4000, visualize=False, verbose = 1)
当我 运行 此代码时,我收到以下错误消息
ValueError: Model output "Tensor("dense_23/BiasAdd:0", shape=(None, 3), dtype=float32)" has invalid shape. DQN expects a model that has one dimension for each action, in this case (3,).
由 dqn = DQNAgent (model = model, memory = memory, policy=policy, nb_actions=actions, nb_steps_warmup=10, target_model_update= 1e-2)
行抛出
谁能告诉我为什么会出现这个问题以及如何解决这个问题?我认为它与构建的模型有关,因此与动作和状态空间有关。但是我无法弄清楚到底是什么问题。
赏金提醒:我的赏金很快就要到期了,不幸的是,我仍然没有收到任何答复。如果您至少猜到了如何解决该问题,如果您能与我分享您的想法,我将不胜感激,对此我将不胜感激。
正如我们在评论中所说,似乎不再支持 Keras-rl 库(存储库中的最后一次更新是在 2019 年),因此现在可能所有内容都在 Keras 中。我查看了 Keras 文档,并没有构建强化学习模型的高级函数,但可以使用较低级别的函数。
- 以下是如何将深度 Q 学习与 Keras 结合使用的示例:link
另一种解决方案可能是降级到 Tensorflow 1.0,因为似乎兼容性问题是由于 2.0 版本中的一些更改引起的。我没有测试,但也许 Keras-rl + Tensorflow 1.0 可以工作。
还有一个支持Tensorflow 2.0的Keras-rl的branch,存储库已存档,但有可能对你有用
在最终输出之前添加一个展平层可以解决这个错误。示例:
def build_model(states, actions):
model = Sequential()
model.add(Dense(24, activation='relu', input_shape=states))
model.add(Dense(24, activation='relu'))
model.add(Flatten())
model.add(Dense(actions[0] , activation='linear'))
return model
我正在尝试在 OpenAI Gym 中设置具有自定义环境的 Deep-Q-Learning 代理。我有 4 个具有单独限制的连续状态变量和 3 个具有单独限制的整数动作变量。
代码如下:
#%% import
from gym import Env
from gym.spaces import Discrete, Box, Tuple
import numpy as np
#%%
class Custom_Env(Env):
def __init__(self):
# Define the state space
#State variables
self.state_1 = 0
self.state_2 = 0
self.state_3 = 0
self.state_4_currentTimeSlots = 0
#Define the gym components
self.action_space = Box(low=np.array([0, 0, 0]), high=np.array([10, 20, 27]), dtype=np.int)
self.observation_space = Box(low=np.array([20, -20, 0, 0]), high=np.array([22, 250, 100, 287]),dtype=np.float16)
def step(self, action ):
# Update state variables
self.state_1 = self.state_1 + action [0]
self.state_2 = self.state_2 + action [1]
self.state_3 = self.state_3 + action [2]
#Calculate reward
reward = self.state_1 + self.state_2 + self.state_3
#Set placeholder for info
info = {}
#Check if it's the end of the day
if self.state_4_currentTimeSlots >= 287:
done = True
if self.state_4_currentTimeSlots < 287:
done = False
#Move to the next timeslot
self.state_4_currentTimeSlots +=1
state = np.array([self.state_1,self.state_2, self.state_3, self.state_4_currentTimeSlots ])
#Return step information
return state, reward, done, info
def render (self):
pass
def reset (self):
self.state_1 = 0
self.state_2 = 0
self.state_3 = 0
self.state_4_currentTimeSlots = 0
state = np.array([self.state_1,self.state_2, self.state_3, self.state_4_currentTimeSlots ])
return state
#%% Set up the environment
env = Custom_Env()
#%% Create a deep learning model with keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.optimizers import Adam
def build_model(states, actions):
model = Sequential()
model.add(Dense(24, activation='relu', input_shape=states))
model.add(Dense(24, activation='relu'))
model.add(Dense(actions[0] , activation='linear'))
return model
states = env.observation_space.shape
actions = env.action_space.shape
print("env.observation_space: ", env.observation_space)
print("env.observation_space.shape : ", env.observation_space.shape )
print("action_space: ", env.action_space)
print("action_space.shape : ", env.action_space.shape )
model = build_model(states, actions)
print(model.summary())
#%% Build Agent wit Keras-RL
from rl.agents import DQNAgent
from rl.policy import BoltzmannQPolicy
from rl.memory import SequentialMemory
def build_agent (model, actions):
policy = BoltzmannQPolicy()
memory = SequentialMemory(limit = 50000, window_length=1)
dqn = DQNAgent (model = model, memory = memory, policy=policy,
nb_actions=actions, nb_steps_warmup=10, target_model_update= 1e-2)
return dqn
dqn = build_agent(model, actions)
dqn.compile(Adam(lr=1e-3), metrics = ['mae'])
dqn.fit (env, nb_steps = 4000, visualize=False, verbose = 1)
当我 运行 此代码时,我收到以下错误消息
ValueError: Model output "Tensor("dense_23/BiasAdd:0", shape=(None, 3), dtype=float32)" has invalid shape. DQN expects a model that has one dimension for each action, in this case (3,).
由 dqn = DQNAgent (model = model, memory = memory, policy=policy, nb_actions=actions, nb_steps_warmup=10, target_model_update= 1e-2)
谁能告诉我为什么会出现这个问题以及如何解决这个问题?我认为它与构建的模型有关,因此与动作和状态空间有关。但是我无法弄清楚到底是什么问题。
赏金提醒:我的赏金很快就要到期了,不幸的是,我仍然没有收到任何答复。如果您至少猜到了如何解决该问题,如果您能与我分享您的想法,我将不胜感激,对此我将不胜感激。
正如我们在评论中所说,似乎不再支持 Keras-rl 库(存储库中的最后一次更新是在 2019 年),因此现在可能所有内容都在 Keras 中。我查看了 Keras 文档,并没有构建强化学习模型的高级函数,但可以使用较低级别的函数。
- 以下是如何将深度 Q 学习与 Keras 结合使用的示例:link
另一种解决方案可能是降级到 Tensorflow 1.0,因为似乎兼容性问题是由于 2.0 版本中的一些更改引起的。我没有测试,但也许 Keras-rl + Tensorflow 1.0 可以工作。
还有一个支持Tensorflow 2.0的Keras-rl的branch,存储库已存档,但有可能对你有用
在最终输出之前添加一个展平层可以解决这个错误。示例:
def build_model(states, actions):
model = Sequential()
model.add(Dense(24, activation='relu', input_shape=states))
model.add(Dense(24, activation='relu'))
model.add(Flatten())
model.add(Dense(actions[0] , activation='linear'))
return model