OpenAI Gym:在一个动作中遍历所有可能的动作 space
OpenAI Gym: Walk through all possible actions in an action space
我想构建一种蛮力方法,在选择最佳动作之前测试 Gym 动作中的所有动作 space。是否有任何简单、直接的方法来获取所有可能的操作?
具体来说,我的行动space是
import gym
action_space = gym.spaces.MultiDiscrete([5 for _ in range(4)])
我知道我可以使用 action_space.sample()
对随机动作进行采样,并检查动作是否包含在动作 space 中,但我想生成一个包含所有可能动作的列表 [= =21=].
有没有比一堆 for 循环更优雅(和更高效)的东西? for 循环的问题是我希望它可以处理任何大小的操作 space,因此我不能硬编码 4 个 for 循环来遍历不同的操作。
健身房环境中的动作通常仅由整数表示,这意味着如果您获得可能动作的总数,则可以创建所有可能动作的数组。
在健身房环境中获得可能动作总数的方法取决于动作类型space,对于你的情况,它是一个多离散动作space,因此属性 nvec可以像上面提到的那样使用 by @Valentin Macé -:
>> print(env.action_space.nvec)
array([5, 5, 5, 5], dtype=int64)
请注意,属性 nvec 代表 n 个向量,因为它的输出是一个多维向量。另请注意,该属性是一个 numpy 数组。
现在我们有了数组,可以将其转换为动作列表列表,假设自 action_space.sample 函数 returns 一个随机函数的 numpy 数组来自MultiDiscrete action_space 即 -:
>> env.action_space.sample() # This does not return a single action but 4 actions for your case since you have a multi discrete action space of length 4.
array([2, 2, 0, 1], dtype=int64)
因此,为了将数组转换为每个维度中的可能操作列表列表,我们可以使用列表推导式 -:
>> [list(range(1, (k + 1))) for k in action_space.nvec]
[[1, 2, 3, 4, 5], [1, 2, 3, 4, 5], [1, 2, 3, 4, 5], [1, 2, 3, 4, 5]]
请注意,这可以扩展到任意数量的维度,并且在性能方面也非常有效。
现在你可以只使用两个循环遍历每个维度中可能的操作 -:
possible_actions = [list(range(1, (k + 1))) for k in action_space.nvec]
for action_dim in possible_actions :
for action in action_dim :
# Find best action.....
pass
有关相同内容的更多信息,我希望您也访问 github 上的 this 主题,讨论了一个有点类似的问题,以防您发现相同的内容有用。
编辑:
因此,根据您 @CGFoX 的评论,我假设您想要它,以便可以将所有可能的动作组合向量生成为任意数量维度的列表,有点像 -:
>> get_actions()
[[1, 1, 1, 1], [1, 1, 1, 2] ....] # For all possible combinations.
同样可以使用递归实现,并且只需要两个循环,这也可以扩展到所提供的任意多维度。
def flatten(actions) :
# This function flattens any actions passed somewhat like so -:
# INPUT -: [[1, 2, 3], 4, 5]
# OUTPUT -: [1, 2, 3, 4, 5]
new_actions = [] # Initializing the new flattened list of actions.
for action in actions :
# Loop through the actions
if type(action) == list :
# If any actions is a pair of actions i.e. a list e.g. [1, 1] then
# add it's elements to the new_actions list.
new_actions += action
elif type(action) == int :
# If the action is an integer then append it directly to the new_actions
# list.
new_actions.append(action)
# Returns the new_actions list generated.
return new_actions
def get_actions(possible_actions) :
# This functions recieves as input the possibilities of actions for every dimension
# and returns all possible dimensional combinations for the same.
# Like so -:
# INPUT-: [[1, 2, 3, 4], [1, 2, 3, 4]] # Example for 2 dimensions but can be scaled for any.
# OUTPUT-: [[1, 1], [1, 2], [1, 3] ... [4, 1] ... [4, 4]]
if len(possible_actions) == 1 :
# If there is only one possible list of actions then it itself is the
# list containing all possible combinations and thus is returned.
return possible_actions
pairs = [] # Initializing a list to contain all pairs of actions generated.
for action in possible_actions[0] :
# Now we loop over the first set of possibilities of actions i.e. index 0
# and we make pairs of it with the second set i.e. index 1, appending each pair
# to the pairs list.
# NOTE: Incase the function is recursively called the first set of possibilities
# of actions may contain vectors and thus the newly formed pair has to be flattened.
# i.e. If a pair has already been made in previous generation like so -:
# [[[1, 1], [2, 2], [3, 3] ... ], [1, 2, 3, 4]]
# Then the pair formed will be this -: [[[1, 1], 1], [[1, 1], 2] ... ]
# But we want them to be flattened like so -: [[1, 1, 1], [1, 1, 2] ... ]
for action2 in possible_actions[1] :
pairs.append(flatten([action, action2]))
# Now we create a new list of all possible set of actions by combining the
# newly generated pairs and the sets of possibilities of actions that have not
# been paired i.e. sets other than the first and the second.
# NOTE: When we made pairs we did so only for the first two indexes and not for
# all thus to do so we make a new list with the sets that remained unpaired
# and the paired set. i.e.
# BEFORE PAIRING -: [[1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4]]
# AFTER PAIRING -: [[[1, 1], [1, 2] ... ], [1, 2, 3, 4]] # Notice how the third set
# i.e. the index 2 is still unpaired and first two sets have been paired.
new_possible_actions = [pairs] + possible_actions[2 : ]
# Now we recurse the function and call it within itself to make pairs for the
# left out sets, Note that since the first two sets were combined to form a paired
# first set now this set will be paired with the third set.
# This recursion will keep happening until all the sets have been paired to form
# a single set with all possible combinations.
possible_action_vectors = get_actions(new_possible_actions)
# Finally the result of the recursion is returned.
# NOTE: Only the first index is returned since now the first index contains the
# paired set of actions.
return possible_action_vectors[0]
一旦我们定义了这个函数,它就可以与我们之前生成的动作可能性集一起使用,以获得所有可能的组合,就像这样 -:
possible_actions = [list(range(1, (k + 1))) for k in action_space.nvec]
print(get_actions(possible_actions))
>> [[1, 1, 1, 1], [1, 1, 1, 2], [1, 1, 1, 3], [1, 1, 1, 4], [1, 1, 1, 5], `[1, 1, 2, 1], [1, 1, 2, 2], [1, 1, 2, 3], [1, 1, 2, 4], [1, 1, 2, 5], [1, 1, 3, 1], [1, 1, 3, 2], [1, 1, 3, 3], [1, 1, 3, 4], [1, 1, 3, 5], [1, 1, 4, 1], [1, 1, 4, 2], [1, 1, 4, 3], [1, 1, 4, 4], [1, 1, 4, 5], [1, 1, 5, 1], [1, 1, 5, 2], [1, 1, 5, 3], [1, 1, 5, 4], [1, 1, 5, 5], [1, 2, 1, 1], [1, 2, 1, 2], [1, 2, 1, 3], [1, 2, 1, 4], [1, 2, 1, 5], [1, 2, 2, 1], [1, 2, 2, 2], [1, 2, 2, 3], [1, 2, 2, 4], [1, 2, 2, 5], [1, 2, 3, 1], [1, 2, 3, 2], [1, 2, 3, 3], [1, 2, 3, 4], [1, 2, 3, 5], [1, 2, 4, 1], [1, 2, 4, 2], [1, 2, 4, 3], [1, 2, 4, 4], [1, 2, 4, 5], [1, 2, 5, 1], [1, 2, 5, 2], [1, 2, 5, 3], [1, 2, 5, 4], [1, 2, 5, 5], [1, 3, 1, 1], [1, 3, 1, 2], [1, 3, 1, 3], [1, 3, 1, 4], [1, 3, 1, 5], [1, 3, 2, 1], [1, 3, 2, 2], [1, 3, 2, 3], [1, 3, 2, 4], [1, 3, 2, 5], [1, 3, 3, 1], [1, 3, 3, 2], [1, 3, 3, 3], [1, 3, 3, 4], [1, 3, 3, 5], [1, 3, 4, 1], [1, 3, 4, 2], [1, 3, 4, 3], [1, 3, 4, 4], [1, 3, 4, 5], [1, 3, 5, 1], [1, 3, 5, 2], [1, 3, 5, 3], [1, 3, 5, 4], [1, 3, 5, 5], [1, 4, 1, 1], [1, 4, 1, 2], [1, 4, 1, 3], [1, 4, 1, 4], [1, 4, 1, 5], [1, 4, 2, 1], [1, 4, 2, 2], [1, 4, 2, 3], [1, 4, 2, 4], [1, 4, 2, 5], [1, 4, 3, 1], [1, 4, 3, 2], [1, 4, 3, 3], [1, 4, 3, 4], [1, 4, 3, 5], [1, 4, 4, 1], [1, 4, 4, 2], [1, 4, 4, 3], [1, 4, 4, 4], [1, 4, 4, 5], [1, 4, 5, 1], [1, 4, 5, 2], [1, 4, 5, 3], [1, 4, 5, 4], [1, 4, 5, 5], [1, 5, 1, 1], [1, 5, 1, 2], [1, 5, 1, 3], [1, 5, 1, 4], [1, 5, 1, 5], [1, 5, 2, 1], [1, 5, 2, 2], [1, 5, 2, 3], [1, 5, 2, 4], [1, 5, 2, 5], [1, 5, 3, 1], [1, 5, 3, 2], [1, 5, 3, 3], [1, 5, 3, 4], [1, 5, 3, 5], [1, 5, 4, 1], [1, 5, 4, 2], [1, 5, 4, 3], [1, 5, 4, 4], [1, 5, 4, 5], [1, 5, 5, 1], [1, 5, 5, 2], [1, 5, 5, 3], [1, 5, 5, 4], [1, 5, 5, 5], [2, 1, 1, 1], [2, 1, 1, 2], [2, 1, 1, 3], [2, 1, 1, 4], [2, 1, 1, 5], [2, 1, 2, 1], [2, 1, 2, 2], [2, 1, 2, 3], [2, 1, 2, 4], [2, 1, 2, 5], [2, 1, 3, 1], [2, 1, 3, 2], [2, 1, 3, 3], [2, 1, 3, 4], [2, 1, 3, 5], [2, 1, 4, 1], [2, 1, 4, 2], [2, 1, 4, 3], [2, 1, 4, 4], [2, 1, 4, 5], [2, 1, 5, 1], [2, 1, 5, 2], [2, 1, 5, 3], [2, 1, 5, 4], [2, 1, 5, 5], [2, 2, 1, 1], [2, 2, 1, 2], [2, 2, 1, 3], [2, 2, 1, 4], [2, 2, 1, 5], [2, 2, 2, 1], [2, 2, 2, 2], [2, 2, 2, 3], [2, 2, 2, 4], [2, 2, 2, 5], [2, 2, 3, 1], [2, 2, 3, 2], [2, 2, 3, 3], [2, 2, 3, 4], [2, 2, 3, 5], [2, 2, 4, 1], [2, 2, 4, 2], [2, 2, 4, 3], [2, 2, 4, 4], [2, 2, 4, 5], [2, 2, 5, 1], [2, 2, 5, 2], [2, 2, 5, 3], [2, 2, 5, 4], [2, 2, 5, 5], [2, 3, 1, 1], [2, 3, 1, 2], [2, 3, 1, 3], [2, 3, 1, 4], [2, 3, 1, 5], [2, 3, 2, 1], [2, 3, 2, 2], [2, 3, 2, 3], [2, 3, 2, 4], [2, 3, 2, 5], [2, 3, 3, 1], [2, 3, 3, 2], [2, 3, 3, 3], [2, 3, 3, 4], [2, 3, 3, 5], [2, 3, 4, 1], [2, 3, 4, 2], [2, 3, 4, 3], [2, 3, 4, 4], [2, 3, 4, 5], [2, 3, 5, 1], [2, 3, 5, 2], [2, 3, 5, 3], [2, 3, 5, 4], [2, 3, 5, 5], [2, 4, 1, 1], [2, 4, 1, 2], [2, 4, 1, 3], [2, 4, 1, 4], [2, 4, 1, 5], [2, 4, 2, 1], [2, 4, 2, 2], [2, 4, 2, 3], [2, 4, 2, 4], [2, 4, 2, 5], [2, 4, 3, 1], [2, 4, 3, 2], [2, 4, 3, 3], [2, 4, 3, 4], [2, 4, 3, 5], [2, 4, 4, 1], [2, 4, 4, 2], [2, 4, 4, 3], [2, 4, 4, 4], [2, 4, 4, 5], [2, 4, 5, 1], [2, 4, 5, 2], [2, 4, 5, 3], [2, 4, 5, 4], [2, 4, 5, 5], [2, 5, 1, 1], [2, 5, 1, 2], [2, 5, 1, 3], [2, 5, 1, 4], [2, 5, 1, 5], [2, 5, 2, 1], [2, 5, 2, 2], [2, 5, 2, 3], [2, 5, 2, 4], [2, 5, 2, 5], [2, 5, 3, 1], [2, 5, 3, 2], [2, 5, 3, 3], [2, 5, 3, 4], [2, 5, 3, 5], [2, 5, 4, 1], [2, 5, 4, 2], [2, 5, 4, 3], [2, 5, 4, 4], [2, 5, 4, 5], [2, 5, 5, 1], [2, 5, 5, 2], [2, 5, 5, 3], [2, 5, 5, 4], [2, 5, 5, 5], [3, 1, 1, 1], [3, 1, 1, 2], [3, 1, 1, 3], [3, 1, 1, 4], [3, 1, 1, 5], [3, 1, 2, 1], [3, 1, 2, 2], [3, 1, 2, 3], [3, 1, 2, 4], [3, 1, 2, 5], [3, 1, 3, 1], [3, 1, 3, 2], [3, 1, 3, 3], [3, 1, 3, 4], [3, 1, 3, 5], [3, 1, 4, 1], [3, 1, 4, 2], [3, 1, 4, 3], [3, 1, 4, 4], [3, 1, 4, 5], [3, 1, 5, 1], [3, 1, 5, 2], [3, 1, 5, 3], [3, 1, 5, 4], [3, 1, 5, 5], [3, 2, 1, 1], [3, 2, 1, 2], [3, 2, 1, 3], [3, 2, 1, 4], [3, 2, 1, 5], [3, 2, 2, 1], [3, 2, 2, 2], [3, 2, 2, 3], [3, 2, 2, 4], [3, 2, 2, 5], [3, 2, 3, 1], [3, 2, 3, 2], [3, 2, 3, 3], [3, 2, 3, 4], [3, 2, 3, 5], [3, 2, 4, 1], [3, 2, 4, 2], [3, 2, 4, 3], [3, 2, 4, 4], [3, 2, 4, 5], [3, 2, 5, 1], [3, 2, 5, 2], [3, 2, 5, 3], [3, 2, 5, 4], [3, 2, 5, 5], [3, 3, 1, 1], [3, 3, 1, 2], [3, 3, 1, 3], [3, 3, 1, 4], [3, 3, 1, 5], [3, 3, 2, 1], [3, 3, 2, 2], [3, 3, 2, 3], [3, 3, 2, 4], [3, 3, 2, 5], [3, 3, 3, 1], [3, 3, 3, 2], [3, 3, 3, 3], [3, 3, 3, 4], [3, 3, 3, 5], [3, 3, 4, 1], [3, 3, 4, 2], [3, 3, 4, 3], [3, 3, 4, 4], [3, 3, 4, 5], [3, 3, 5, 1], [3, 3, 5, 2], [3, 3, 5, 3], [3, 3, 5, 4], [3, 3, 5, 5], [3, 4, 1, 1], [3, 4, 1, 2], [3, 4, 1, 3], [3, 4, 1, 4], [3, 4, 1, 5], [3, 4, 2, 1], [3, 4, 2, 2], [3, 4, 2, 3], [3, 4, 2, 4], [3, 4, 2, 5], [3, 4, 3, 1], [3, 4, 3, 2], [3, 4, 3, 3], [3, 4, 3, 4], [3, 4, 3, 5], [3, 4, 4, 1], [3, 4, 4, 2], [3, 4, 4, 3], [3, 4, 4, 4], [3, 4, 4, 5], [3, 4, 5, 1], [3, 4, 5, 2], [3, 4, 5, 3], [3, 4, 5, 4], [3, 4, 5, 5], [3, 5, 1, 1], [3, 5, 1, 2], [3, 5, 1, 3], [3, 5, 1, 4], [3, 5, 1, 5], [3, 5, 2, 1], [3, 5, 2, 2], [3, 5, 2, 3], [3, 5, 2, 4], [3, 5, 2, 5], [3, 5, 3, 1], [3, 5, 3, 2], [3, 5, 3, 3], [3, 5, 3, 4], [3, 5, 3, 5], [3, 5, 4, 1], [3, 5, 4, 2], [3, 5, 4, 3], [3, 5, 4, 4], [3, 5, 4, 5], [3, 5, 5, 1], [3, 5, 5, 2], [3, 5, 5, 3], [3, 5, 5, 4], [3, 5, 5, 5], [4, 1, 1, 1], [4, 1, 1, 2], [4, 1, 1, 3], [4, 1, 1, 4], [4, 1, 1, 5], [4, 1, 2, 1], [4, 1, 2, 2], [4, 1, 2, 3], [4, 1, 2, 4], [4, 1, 2, 5], [4, 1, 3, 1], [4, 1, 3, 2], [4, 1, 3, 3], [4, 1, 3, 4], [4, 1, 3, 5], [4, 1, 4, 1], [4, 1, 4, 2], [4, 1, 4, 3], [4, 1, 4, 4], [4, 1, 4, 5], [4, 1, 5, 1], [4, 1, 5, 2], [4, 1, 5, 3], [4, 1, 5, 4], [4, 1, 5, 5], [4, 2, 1, 1], [4, 2, 1, 2], [4, 2, 1, 3], [4, 2, 1, 4], [4, 2, 1, 5], [4, 2, 2, 1], [4, 2, 2, 2], [4, 2, 2, 3], [4, 2, 2, 4], [4, 2, 2, 5], [4, 2, 3, 1], [4, 2, 3, 2], [4, 2, 3, 3], [4, 2, 3, 4], [4, 2, 3, 5], [4, 2, 4, 1], [4, 2, 4, 2], [4, 2, 4, 3], [4, 2, 4, 4], [4, 2, 4, 5], [4, 2, 5, 1], [4, 2, 5, 2], [4, 2, 5, 3], [4, 2, 5, 4], [4, 2, 5, 5], [4, 3, 1, 1], [4, 3, 1, 2], [4, 3, 1, 3], [4, 3, 1, 4], [4, 3, 1, 5], [4, 3, 2, 1], [4, 3, 2, 2], [4, 3, 2, 3], [4, 3, 2, 4], [4, 3, 2, 5], [4, 3, 3, 1], [4, 3, 3, 2], [4, 3, 3, 3], [4, 3, 3, 4], [4, 3, 3, 5], [4, 3, 4, 1], [4, 3, 4, 2], [4, 3, 4, 3], [4, 3, 4, 4], [4, 3, 4, 5], [4, 3, 5, 1], [4, 3, 5, 2], [4, 3, 5, 3], [4, 3, 5, 4], [4, 3, 5, 5], [4, 4, 1, 1], [4, 4, 1, 2], [4, 4, 1, 3], [4, 4, 1, 4], [4, 4, 1, 5], [4, 4, 2, 1], [4, 4, 2, 2], [4, 4, 2, 3], [4, 4, 2, 4], [4, 4, 2, 5], [4, 4, 3, 1], [4, 4, 3, 2], [4, 4, 3, 3], [4, 4, 3, 4], [4, 4, 3, 5], [4, 4, 4, 1], [4, 4, 4, 2], [4, 4, 4, 3], [4, 4, 4, 4], [4, 4, 4, 5], [4, 4, 5, 1], [4, 4, 5, 2], [4, 4, 5, 3], [4, 4, 5, 4], [4, 4, 5, 5], [4, 5, 1, 1], [4, 5, 1, 2], [4, 5, 1, 3], [4, 5, 1, 4], [4, 5, 1, 5], [4, 5, 2, 1], [4, 5, 2, 2], [4, 5, 2, 3], [4, 5, 2, 4], [4, 5, 2, 5], [4, 5, 3, 1], [4, 5, 3, 2], [4, 5, 3, 3], [4, 5, 3, 4], [4, 5, 3, 5], [4, 5, 4, 1], [4, 5, 4, 2], [4, 5, 4, 3], [4, 5, 4, 4], [4, 5, 4, 5], [4, 5, 5, 1], [4, 5, 5, 2], [4, 5, 5, 3], [4, 5, 5, 4], [4, 5, 5, 5], [5, 1, 1, 1], [5, 1, 1, 2], [5, 1, 1, 3], [5, 1, 1, 4], [5, 1, 1, 5], [5, 1, 2, 1], [5, 1, 2, 2], [5, 1, 2, 3], [5, 1, 2, 4], [5, 1, 2, 5], [5, 1, 3, 1], [5, 1, 3, 2], [5, 1, 3, 3], [5, 1, 3, 4], [5, 1, 3, 5], [5, 1, 4, 1], [5, 1, 4, 2], [5, 1, 4, 3], [5, 1, 4, 4], [5, 1, 4, 5], [5, 1, 5, 1], [5, 1, 5, 2], [5, 1, 5, 3], [5, 1, 5, 4], [5, 1, 5, 5], [5, 2, 1, 1], [5, 2, 1, 2], [5, 2, 1, 3], [5, 2, 1, 4], [5, 2, 1, 5], [5, 2, 2, 1], [5, 2, 2, 2], [5, 2, 2, 3], [5, 2, 2, 4], [5, 2, 2, 5], [5, 2, 3, 1], [5, 2, 3, 2], [5, 2, 3, 3], [5, 2, 3, 4], [5, 2, 3, 5], [5, 2, 4, 1], [5, 2, 4, 2], [5, 2, 4, 3], [5, 2, 4, 4], [5, 2, 4, 5], [5, 2, 5, 1], [5, 2, 5, 2], [5, 2, 5, 3], [5, 2, 5, 4], [5, 2, 5, 5], [5, 3, 1, 1], [5, 3, 1, 2], [5, 3, 1, 3], [5, 3, 1, 4], [5, 3, 1, 5], [5, 3, 2, 1], [5, 3, 2, 2], [5, 3, 2, 3], [5, 3, 2, 4], [5, 3, 2, 5], [5, 3, 3, 1], [5, 3, 3, 2], [5, 3, 3, 3], [5, 3, 3, 4], [5, 3, 3, 5], [5, 3, 4, 1], [5, 3, 4, 2], [5, 3, 4, 3], [5, 3, 4, 4], [5, 3, 4, 5], [5, 3, 5, 1], [5, 3, 5, 2], [5, 3, 5, 3], [5, 3, 5, 4], [5, 3, 5, 5], [5, 4, 1, 1], [5, 4, 1, 2], [5, 4, 1, 3], [5, 4, 1, 4], [5, 4, 1, 5], [5, 4, 2, 1], [5, 4, 2, 2], [5, 4, 2, 3], [5, 4, 2, 4], [5, 4, 2, 5], [5, 4, 3, 1], [5, 4, 3, 2], [5, 4, 3, 3], [5, 4, 3, 4], [5, 4, 3, 5], [5, 4, 4, 1], [5, 4, 4, 2], [5, 4, 4, 3], [5, 4, 4, 4], [5, 4, 4, 5], [5, 4, 5, 1], [5, 4, 5, 2], [5, 4, 5, 3], [5, 4, 5, 4], [5, 4, 5, 5], [5, 5, 1, 1], [5, 5, 1, 2], [5, 5, 1, 3], [5, 5, 1, 4], [5, 5, 1, 5], [5, 5, 2, 1], [5, 5, 2, 2], [5, 5, 2, 3], [5, 5, 2, 4], [5, 5, 2, 5], [5, 5, 3, 1], [5, 5, 3, 2], [5, 5, 3, 3], [5, 5, 3, 4], [5, 5, 3, 5], [5, 5, 4, 1], [5, 5, 4, 2], [5, 5, 4, 3], [5, 5, 4, 4], [5, 5, 4, 5], [5, 5, 5, 1], [5, 5, 5, 2], [5, 5, 5, 3], [5, 5, 5, 4], [5, 5, 5, 5]]
EDIT-2:我修复了一些以前返回嵌套列表的代码,现在返回的列表是成对的列表,而不是嵌套在另一个列表中。
EDIT-3-:修正了我的拼写错误。
也可以使用下面的函数,分别列出观察或动作中的所有状态或动作 space。
def get_space_list(space):
"""
Converts gym `space`, constructed from `types`, to list `space_list`
"""
# -------------------------------- #
types = [
gym.spaces.multi_binary.MultiBinary,
gym.spaces.discrete.Discrete,
gym.spaces.multi_discrete.MultiDiscrete,
gym.spaces.dict.Dict,
gym.spaces.tuple.Tuple,
]
if type(space) not in types:
raise ValueError(f'input space {space} is not constructed from spaces of types:' + '\n' + str(types))
# -------------------------------- #
if type(space) is gym.spaces.multi_binary.MultiBinary:
return [
np.reshape(np.array(element), space.n)
for element in itertools.product(
*[range(2)] * np.prod(space.n)
)
]
if type(space) is gym.spaces.discrete.Discrete:
return list(range(space.n))
if type(space) is gym.spaces.multi_discrete.MultiDiscrete:
return [
np.array(element) for element in itertools.product(
*[range(n) for n in space.nvec]
)
]
if type(space) is gym.spaces.dict.Dict:
keys = space.spaces.keys()
values_list = itertools.product(
*[get_space_list(sub_space) for sub_space in space.spaces.values()]
)
return [
{key: value for key, value in zip(keys, values)}
for values in values_list
]
return space_list
if type(space) is gym.spaces.tuple.Tuple:
return [
list(element) for element in itertools.product(
*[get_space_list(sub_space) for sub_space in space.spaces]
)
]
# -------------------------------- #
我想构建一种蛮力方法,在选择最佳动作之前测试 Gym 动作中的所有动作 space。是否有任何简单、直接的方法来获取所有可能的操作?
具体来说,我的行动space是
import gym
action_space = gym.spaces.MultiDiscrete([5 for _ in range(4)])
我知道我可以使用 action_space.sample()
对随机动作进行采样,并检查动作是否包含在动作 space 中,但我想生成一个包含所有可能动作的列表 [= =21=].
有没有比一堆 for 循环更优雅(和更高效)的东西? for 循环的问题是我希望它可以处理任何大小的操作 space,因此我不能硬编码 4 个 for 循环来遍历不同的操作。
健身房环境中的动作通常仅由整数表示,这意味着如果您获得可能动作的总数,则可以创建所有可能动作的数组。
在健身房环境中获得可能动作总数的方法取决于动作类型space,对于你的情况,它是一个多离散动作space,因此属性 nvec可以像上面提到的那样使用
>> print(env.action_space.nvec)
array([5, 5, 5, 5], dtype=int64)
请注意,属性 nvec 代表 n 个向量,因为它的输出是一个多维向量。另请注意,该属性是一个 numpy 数组。
现在我们有了数组,可以将其转换为动作列表列表,假设自 action_space.sample 函数 returns 一个随机函数的 numpy 数组来自MultiDiscrete action_space 即 -:
>> env.action_space.sample() # This does not return a single action but 4 actions for your case since you have a multi discrete action space of length 4.
array([2, 2, 0, 1], dtype=int64)
因此,为了将数组转换为每个维度中的可能操作列表列表,我们可以使用列表推导式 -:
>> [list(range(1, (k + 1))) for k in action_space.nvec]
[[1, 2, 3, 4, 5], [1, 2, 3, 4, 5], [1, 2, 3, 4, 5], [1, 2, 3, 4, 5]]
请注意,这可以扩展到任意数量的维度,并且在性能方面也非常有效。
现在你可以只使用两个循环遍历每个维度中可能的操作 -:
possible_actions = [list(range(1, (k + 1))) for k in action_space.nvec]
for action_dim in possible_actions :
for action in action_dim :
# Find best action.....
pass
有关相同内容的更多信息,我希望您也访问 github 上的 this 主题,讨论了一个有点类似的问题,以防您发现相同的内容有用。
编辑: 因此,根据您 @CGFoX 的评论,我假设您想要它,以便可以将所有可能的动作组合向量生成为任意数量维度的列表,有点像 -:
>> get_actions()
[[1, 1, 1, 1], [1, 1, 1, 2] ....] # For all possible combinations.
同样可以使用递归实现,并且只需要两个循环,这也可以扩展到所提供的任意多维度。
def flatten(actions) :
# This function flattens any actions passed somewhat like so -:
# INPUT -: [[1, 2, 3], 4, 5]
# OUTPUT -: [1, 2, 3, 4, 5]
new_actions = [] # Initializing the new flattened list of actions.
for action in actions :
# Loop through the actions
if type(action) == list :
# If any actions is a pair of actions i.e. a list e.g. [1, 1] then
# add it's elements to the new_actions list.
new_actions += action
elif type(action) == int :
# If the action is an integer then append it directly to the new_actions
# list.
new_actions.append(action)
# Returns the new_actions list generated.
return new_actions
def get_actions(possible_actions) :
# This functions recieves as input the possibilities of actions for every dimension
# and returns all possible dimensional combinations for the same.
# Like so -:
# INPUT-: [[1, 2, 3, 4], [1, 2, 3, 4]] # Example for 2 dimensions but can be scaled for any.
# OUTPUT-: [[1, 1], [1, 2], [1, 3] ... [4, 1] ... [4, 4]]
if len(possible_actions) == 1 :
# If there is only one possible list of actions then it itself is the
# list containing all possible combinations and thus is returned.
return possible_actions
pairs = [] # Initializing a list to contain all pairs of actions generated.
for action in possible_actions[0] :
# Now we loop over the first set of possibilities of actions i.e. index 0
# and we make pairs of it with the second set i.e. index 1, appending each pair
# to the pairs list.
# NOTE: Incase the function is recursively called the first set of possibilities
# of actions may contain vectors and thus the newly formed pair has to be flattened.
# i.e. If a pair has already been made in previous generation like so -:
# [[[1, 1], [2, 2], [3, 3] ... ], [1, 2, 3, 4]]
# Then the pair formed will be this -: [[[1, 1], 1], [[1, 1], 2] ... ]
# But we want them to be flattened like so -: [[1, 1, 1], [1, 1, 2] ... ]
for action2 in possible_actions[1] :
pairs.append(flatten([action, action2]))
# Now we create a new list of all possible set of actions by combining the
# newly generated pairs and the sets of possibilities of actions that have not
# been paired i.e. sets other than the first and the second.
# NOTE: When we made pairs we did so only for the first two indexes and not for
# all thus to do so we make a new list with the sets that remained unpaired
# and the paired set. i.e.
# BEFORE PAIRING -: [[1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4]]
# AFTER PAIRING -: [[[1, 1], [1, 2] ... ], [1, 2, 3, 4]] # Notice how the third set
# i.e. the index 2 is still unpaired and first two sets have been paired.
new_possible_actions = [pairs] + possible_actions[2 : ]
# Now we recurse the function and call it within itself to make pairs for the
# left out sets, Note that since the first two sets were combined to form a paired
# first set now this set will be paired with the third set.
# This recursion will keep happening until all the sets have been paired to form
# a single set with all possible combinations.
possible_action_vectors = get_actions(new_possible_actions)
# Finally the result of the recursion is returned.
# NOTE: Only the first index is returned since now the first index contains the
# paired set of actions.
return possible_action_vectors[0]
一旦我们定义了这个函数,它就可以与我们之前生成的动作可能性集一起使用,以获得所有可能的组合,就像这样 -:
possible_actions = [list(range(1, (k + 1))) for k in action_space.nvec]
print(get_actions(possible_actions))
>> [[1, 1, 1, 1], [1, 1, 1, 2], [1, 1, 1, 3], [1, 1, 1, 4], [1, 1, 1, 5], `[1, 1, 2, 1], [1, 1, 2, 2], [1, 1, 2, 3], [1, 1, 2, 4], [1, 1, 2, 5], [1, 1, 3, 1], [1, 1, 3, 2], [1, 1, 3, 3], [1, 1, 3, 4], [1, 1, 3, 5], [1, 1, 4, 1], [1, 1, 4, 2], [1, 1, 4, 3], [1, 1, 4, 4], [1, 1, 4, 5], [1, 1, 5, 1], [1, 1, 5, 2], [1, 1, 5, 3], [1, 1, 5, 4], [1, 1, 5, 5], [1, 2, 1, 1], [1, 2, 1, 2], [1, 2, 1, 3], [1, 2, 1, 4], [1, 2, 1, 5], [1, 2, 2, 1], [1, 2, 2, 2], [1, 2, 2, 3], [1, 2, 2, 4], [1, 2, 2, 5], [1, 2, 3, 1], [1, 2, 3, 2], [1, 2, 3, 3], [1, 2, 3, 4], [1, 2, 3, 5], [1, 2, 4, 1], [1, 2, 4, 2], [1, 2, 4, 3], [1, 2, 4, 4], [1, 2, 4, 5], [1, 2, 5, 1], [1, 2, 5, 2], [1, 2, 5, 3], [1, 2, 5, 4], [1, 2, 5, 5], [1, 3, 1, 1], [1, 3, 1, 2], [1, 3, 1, 3], [1, 3, 1, 4], [1, 3, 1, 5], [1, 3, 2, 1], [1, 3, 2, 2], [1, 3, 2, 3], [1, 3, 2, 4], [1, 3, 2, 5], [1, 3, 3, 1], [1, 3, 3, 2], [1, 3, 3, 3], [1, 3, 3, 4], [1, 3, 3, 5], [1, 3, 4, 1], [1, 3, 4, 2], [1, 3, 4, 3], [1, 3, 4, 4], [1, 3, 4, 5], [1, 3, 5, 1], [1, 3, 5, 2], [1, 3, 5, 3], [1, 3, 5, 4], [1, 3, 5, 5], [1, 4, 1, 1], [1, 4, 1, 2], [1, 4, 1, 3], [1, 4, 1, 4], [1, 4, 1, 5], [1, 4, 2, 1], [1, 4, 2, 2], [1, 4, 2, 3], [1, 4, 2, 4], [1, 4, 2, 5], [1, 4, 3, 1], [1, 4, 3, 2], [1, 4, 3, 3], [1, 4, 3, 4], [1, 4, 3, 5], [1, 4, 4, 1], [1, 4, 4, 2], [1, 4, 4, 3], [1, 4, 4, 4], [1, 4, 4, 5], [1, 4, 5, 1], [1, 4, 5, 2], [1, 4, 5, 3], [1, 4, 5, 4], [1, 4, 5, 5], [1, 5, 1, 1], [1, 5, 1, 2], [1, 5, 1, 3], [1, 5, 1, 4], [1, 5, 1, 5], [1, 5, 2, 1], [1, 5, 2, 2], [1, 5, 2, 3], [1, 5, 2, 4], [1, 5, 2, 5], [1, 5, 3, 1], [1, 5, 3, 2], [1, 5, 3, 3], [1, 5, 3, 4], [1, 5, 3, 5], [1, 5, 4, 1], [1, 5, 4, 2], [1, 5, 4, 3], [1, 5, 4, 4], [1, 5, 4, 5], [1, 5, 5, 1], [1, 5, 5, 2], [1, 5, 5, 3], [1, 5, 5, 4], [1, 5, 5, 5], [2, 1, 1, 1], [2, 1, 1, 2], [2, 1, 1, 3], [2, 1, 1, 4], [2, 1, 1, 5], [2, 1, 2, 1], [2, 1, 2, 2], [2, 1, 2, 3], [2, 1, 2, 4], [2, 1, 2, 5], [2, 1, 3, 1], [2, 1, 3, 2], [2, 1, 3, 3], [2, 1, 3, 4], [2, 1, 3, 5], [2, 1, 4, 1], [2, 1, 4, 2], [2, 1, 4, 3], [2, 1, 4, 4], [2, 1, 4, 5], [2, 1, 5, 1], [2, 1, 5, 2], [2, 1, 5, 3], [2, 1, 5, 4], [2, 1, 5, 5], [2, 2, 1, 1], [2, 2, 1, 2], [2, 2, 1, 3], [2, 2, 1, 4], [2, 2, 1, 5], [2, 2, 2, 1], [2, 2, 2, 2], [2, 2, 2, 3], [2, 2, 2, 4], [2, 2, 2, 5], [2, 2, 3, 1], [2, 2, 3, 2], [2, 2, 3, 3], [2, 2, 3, 4], [2, 2, 3, 5], [2, 2, 4, 1], [2, 2, 4, 2], [2, 2, 4, 3], [2, 2, 4, 4], [2, 2, 4, 5], [2, 2, 5, 1], [2, 2, 5, 2], [2, 2, 5, 3], [2, 2, 5, 4], [2, 2, 5, 5], [2, 3, 1, 1], [2, 3, 1, 2], [2, 3, 1, 3], [2, 3, 1, 4], [2, 3, 1, 5], [2, 3, 2, 1], [2, 3, 2, 2], [2, 3, 2, 3], [2, 3, 2, 4], [2, 3, 2, 5], [2, 3, 3, 1], [2, 3, 3, 2], [2, 3, 3, 3], [2, 3, 3, 4], [2, 3, 3, 5], [2, 3, 4, 1], [2, 3, 4, 2], [2, 3, 4, 3], [2, 3, 4, 4], [2, 3, 4, 5], [2, 3, 5, 1], [2, 3, 5, 2], [2, 3, 5, 3], [2, 3, 5, 4], [2, 3, 5, 5], [2, 4, 1, 1], [2, 4, 1, 2], [2, 4, 1, 3], [2, 4, 1, 4], [2, 4, 1, 5], [2, 4, 2, 1], [2, 4, 2, 2], [2, 4, 2, 3], [2, 4, 2, 4], [2, 4, 2, 5], [2, 4, 3, 1], [2, 4, 3, 2], [2, 4, 3, 3], [2, 4, 3, 4], [2, 4, 3, 5], [2, 4, 4, 1], [2, 4, 4, 2], [2, 4, 4, 3], [2, 4, 4, 4], [2, 4, 4, 5], [2, 4, 5, 1], [2, 4, 5, 2], [2, 4, 5, 3], [2, 4, 5, 4], [2, 4, 5, 5], [2, 5, 1, 1], [2, 5, 1, 2], [2, 5, 1, 3], [2, 5, 1, 4], [2, 5, 1, 5], [2, 5, 2, 1], [2, 5, 2, 2], [2, 5, 2, 3], [2, 5, 2, 4], [2, 5, 2, 5], [2, 5, 3, 1], [2, 5, 3, 2], [2, 5, 3, 3], [2, 5, 3, 4], [2, 5, 3, 5], [2, 5, 4, 1], [2, 5, 4, 2], [2, 5, 4, 3], [2, 5, 4, 4], [2, 5, 4, 5], [2, 5, 5, 1], [2, 5, 5, 2], [2, 5, 5, 3], [2, 5, 5, 4], [2, 5, 5, 5], [3, 1, 1, 1], [3, 1, 1, 2], [3, 1, 1, 3], [3, 1, 1, 4], [3, 1, 1, 5], [3, 1, 2, 1], [3, 1, 2, 2], [3, 1, 2, 3], [3, 1, 2, 4], [3, 1, 2, 5], [3, 1, 3, 1], [3, 1, 3, 2], [3, 1, 3, 3], [3, 1, 3, 4], [3, 1, 3, 5], [3, 1, 4, 1], [3, 1, 4, 2], [3, 1, 4, 3], [3, 1, 4, 4], [3, 1, 4, 5], [3, 1, 5, 1], [3, 1, 5, 2], [3, 1, 5, 3], [3, 1, 5, 4], [3, 1, 5, 5], [3, 2, 1, 1], [3, 2, 1, 2], [3, 2, 1, 3], [3, 2, 1, 4], [3, 2, 1, 5], [3, 2, 2, 1], [3, 2, 2, 2], [3, 2, 2, 3], [3, 2, 2, 4], [3, 2, 2, 5], [3, 2, 3, 1], [3, 2, 3, 2], [3, 2, 3, 3], [3, 2, 3, 4], [3, 2, 3, 5], [3, 2, 4, 1], [3, 2, 4, 2], [3, 2, 4, 3], [3, 2, 4, 4], [3, 2, 4, 5], [3, 2, 5, 1], [3, 2, 5, 2], [3, 2, 5, 3], [3, 2, 5, 4], [3, 2, 5, 5], [3, 3, 1, 1], [3, 3, 1, 2], [3, 3, 1, 3], [3, 3, 1, 4], [3, 3, 1, 5], [3, 3, 2, 1], [3, 3, 2, 2], [3, 3, 2, 3], [3, 3, 2, 4], [3, 3, 2, 5], [3, 3, 3, 1], [3, 3, 3, 2], [3, 3, 3, 3], [3, 3, 3, 4], [3, 3, 3, 5], [3, 3, 4, 1], [3, 3, 4, 2], [3, 3, 4, 3], [3, 3, 4, 4], [3, 3, 4, 5], [3, 3, 5, 1], [3, 3, 5, 2], [3, 3, 5, 3], [3, 3, 5, 4], [3, 3, 5, 5], [3, 4, 1, 1], [3, 4, 1, 2], [3, 4, 1, 3], [3, 4, 1, 4], [3, 4, 1, 5], [3, 4, 2, 1], [3, 4, 2, 2], [3, 4, 2, 3], [3, 4, 2, 4], [3, 4, 2, 5], [3, 4, 3, 1], [3, 4, 3, 2], [3, 4, 3, 3], [3, 4, 3, 4], [3, 4, 3, 5], [3, 4, 4, 1], [3, 4, 4, 2], [3, 4, 4, 3], [3, 4, 4, 4], [3, 4, 4, 5], [3, 4, 5, 1], [3, 4, 5, 2], [3, 4, 5, 3], [3, 4, 5, 4], [3, 4, 5, 5], [3, 5, 1, 1], [3, 5, 1, 2], [3, 5, 1, 3], [3, 5, 1, 4], [3, 5, 1, 5], [3, 5, 2, 1], [3, 5, 2, 2], [3, 5, 2, 3], [3, 5, 2, 4], [3, 5, 2, 5], [3, 5, 3, 1], [3, 5, 3, 2], [3, 5, 3, 3], [3, 5, 3, 4], [3, 5, 3, 5], [3, 5, 4, 1], [3, 5, 4, 2], [3, 5, 4, 3], [3, 5, 4, 4], [3, 5, 4, 5], [3, 5, 5, 1], [3, 5, 5, 2], [3, 5, 5, 3], [3, 5, 5, 4], [3, 5, 5, 5], [4, 1, 1, 1], [4, 1, 1, 2], [4, 1, 1, 3], [4, 1, 1, 4], [4, 1, 1, 5], [4, 1, 2, 1], [4, 1, 2, 2], [4, 1, 2, 3], [4, 1, 2, 4], [4, 1, 2, 5], [4, 1, 3, 1], [4, 1, 3, 2], [4, 1, 3, 3], [4, 1, 3, 4], [4, 1, 3, 5], [4, 1, 4, 1], [4, 1, 4, 2], [4, 1, 4, 3], [4, 1, 4, 4], [4, 1, 4, 5], [4, 1, 5, 1], [4, 1, 5, 2], [4, 1, 5, 3], [4, 1, 5, 4], [4, 1, 5, 5], [4, 2, 1, 1], [4, 2, 1, 2], [4, 2, 1, 3], [4, 2, 1, 4], [4, 2, 1, 5], [4, 2, 2, 1], [4, 2, 2, 2], [4, 2, 2, 3], [4, 2, 2, 4], [4, 2, 2, 5], [4, 2, 3, 1], [4, 2, 3, 2], [4, 2, 3, 3], [4, 2, 3, 4], [4, 2, 3, 5], [4, 2, 4, 1], [4, 2, 4, 2], [4, 2, 4, 3], [4, 2, 4, 4], [4, 2, 4, 5], [4, 2, 5, 1], [4, 2, 5, 2], [4, 2, 5, 3], [4, 2, 5, 4], [4, 2, 5, 5], [4, 3, 1, 1], [4, 3, 1, 2], [4, 3, 1, 3], [4, 3, 1, 4], [4, 3, 1, 5], [4, 3, 2, 1], [4, 3, 2, 2], [4, 3, 2, 3], [4, 3, 2, 4], [4, 3, 2, 5], [4, 3, 3, 1], [4, 3, 3, 2], [4, 3, 3, 3], [4, 3, 3, 4], [4, 3, 3, 5], [4, 3, 4, 1], [4, 3, 4, 2], [4, 3, 4, 3], [4, 3, 4, 4], [4, 3, 4, 5], [4, 3, 5, 1], [4, 3, 5, 2], [4, 3, 5, 3], [4, 3, 5, 4], [4, 3, 5, 5], [4, 4, 1, 1], [4, 4, 1, 2], [4, 4, 1, 3], [4, 4, 1, 4], [4, 4, 1, 5], [4, 4, 2, 1], [4, 4, 2, 2], [4, 4, 2, 3], [4, 4, 2, 4], [4, 4, 2, 5], [4, 4, 3, 1], [4, 4, 3, 2], [4, 4, 3, 3], [4, 4, 3, 4], [4, 4, 3, 5], [4, 4, 4, 1], [4, 4, 4, 2], [4, 4, 4, 3], [4, 4, 4, 4], [4, 4, 4, 5], [4, 4, 5, 1], [4, 4, 5, 2], [4, 4, 5, 3], [4, 4, 5, 4], [4, 4, 5, 5], [4, 5, 1, 1], [4, 5, 1, 2], [4, 5, 1, 3], [4, 5, 1, 4], [4, 5, 1, 5], [4, 5, 2, 1], [4, 5, 2, 2], [4, 5, 2, 3], [4, 5, 2, 4], [4, 5, 2, 5], [4, 5, 3, 1], [4, 5, 3, 2], [4, 5, 3, 3], [4, 5, 3, 4], [4, 5, 3, 5], [4, 5, 4, 1], [4, 5, 4, 2], [4, 5, 4, 3], [4, 5, 4, 4], [4, 5, 4, 5], [4, 5, 5, 1], [4, 5, 5, 2], [4, 5, 5, 3], [4, 5, 5, 4], [4, 5, 5, 5], [5, 1, 1, 1], [5, 1, 1, 2], [5, 1, 1, 3], [5, 1, 1, 4], [5, 1, 1, 5], [5, 1, 2, 1], [5, 1, 2, 2], [5, 1, 2, 3], [5, 1, 2, 4], [5, 1, 2, 5], [5, 1, 3, 1], [5, 1, 3, 2], [5, 1, 3, 3], [5, 1, 3, 4], [5, 1, 3, 5], [5, 1, 4, 1], [5, 1, 4, 2], [5, 1, 4, 3], [5, 1, 4, 4], [5, 1, 4, 5], [5, 1, 5, 1], [5, 1, 5, 2], [5, 1, 5, 3], [5, 1, 5, 4], [5, 1, 5, 5], [5, 2, 1, 1], [5, 2, 1, 2], [5, 2, 1, 3], [5, 2, 1, 4], [5, 2, 1, 5], [5, 2, 2, 1], [5, 2, 2, 2], [5, 2, 2, 3], [5, 2, 2, 4], [5, 2, 2, 5], [5, 2, 3, 1], [5, 2, 3, 2], [5, 2, 3, 3], [5, 2, 3, 4], [5, 2, 3, 5], [5, 2, 4, 1], [5, 2, 4, 2], [5, 2, 4, 3], [5, 2, 4, 4], [5, 2, 4, 5], [5, 2, 5, 1], [5, 2, 5, 2], [5, 2, 5, 3], [5, 2, 5, 4], [5, 2, 5, 5], [5, 3, 1, 1], [5, 3, 1, 2], [5, 3, 1, 3], [5, 3, 1, 4], [5, 3, 1, 5], [5, 3, 2, 1], [5, 3, 2, 2], [5, 3, 2, 3], [5, 3, 2, 4], [5, 3, 2, 5], [5, 3, 3, 1], [5, 3, 3, 2], [5, 3, 3, 3], [5, 3, 3, 4], [5, 3, 3, 5], [5, 3, 4, 1], [5, 3, 4, 2], [5, 3, 4, 3], [5, 3, 4, 4], [5, 3, 4, 5], [5, 3, 5, 1], [5, 3, 5, 2], [5, 3, 5, 3], [5, 3, 5, 4], [5, 3, 5, 5], [5, 4, 1, 1], [5, 4, 1, 2], [5, 4, 1, 3], [5, 4, 1, 4], [5, 4, 1, 5], [5, 4, 2, 1], [5, 4, 2, 2], [5, 4, 2, 3], [5, 4, 2, 4], [5, 4, 2, 5], [5, 4, 3, 1], [5, 4, 3, 2], [5, 4, 3, 3], [5, 4, 3, 4], [5, 4, 3, 5], [5, 4, 4, 1], [5, 4, 4, 2], [5, 4, 4, 3], [5, 4, 4, 4], [5, 4, 4, 5], [5, 4, 5, 1], [5, 4, 5, 2], [5, 4, 5, 3], [5, 4, 5, 4], [5, 4, 5, 5], [5, 5, 1, 1], [5, 5, 1, 2], [5, 5, 1, 3], [5, 5, 1, 4], [5, 5, 1, 5], [5, 5, 2, 1], [5, 5, 2, 2], [5, 5, 2, 3], [5, 5, 2, 4], [5, 5, 2, 5], [5, 5, 3, 1], [5, 5, 3, 2], [5, 5, 3, 3], [5, 5, 3, 4], [5, 5, 3, 5], [5, 5, 4, 1], [5, 5, 4, 2], [5, 5, 4, 3], [5, 5, 4, 4], [5, 5, 4, 5], [5, 5, 5, 1], [5, 5, 5, 2], [5, 5, 5, 3], [5, 5, 5, 4], [5, 5, 5, 5]]
EDIT-2:我修复了一些以前返回嵌套列表的代码,现在返回的列表是成对的列表,而不是嵌套在另一个列表中。
EDIT-3-:修正了我的拼写错误。
也可以使用下面的函数,分别列出观察或动作中的所有状态或动作 space。
def get_space_list(space):
"""
Converts gym `space`, constructed from `types`, to list `space_list`
"""
# -------------------------------- #
types = [
gym.spaces.multi_binary.MultiBinary,
gym.spaces.discrete.Discrete,
gym.spaces.multi_discrete.MultiDiscrete,
gym.spaces.dict.Dict,
gym.spaces.tuple.Tuple,
]
if type(space) not in types:
raise ValueError(f'input space {space} is not constructed from spaces of types:' + '\n' + str(types))
# -------------------------------- #
if type(space) is gym.spaces.multi_binary.MultiBinary:
return [
np.reshape(np.array(element), space.n)
for element in itertools.product(
*[range(2)] * np.prod(space.n)
)
]
if type(space) is gym.spaces.discrete.Discrete:
return list(range(space.n))
if type(space) is gym.spaces.multi_discrete.MultiDiscrete:
return [
np.array(element) for element in itertools.product(
*[range(n) for n in space.nvec]
)
]
if type(space) is gym.spaces.dict.Dict:
keys = space.spaces.keys()
values_list = itertools.product(
*[get_space_list(sub_space) for sub_space in space.spaces.values()]
)
return [
{key: value for key, value in zip(keys, values)}
for values in values_list
]
return space_list
if type(space) is gym.spaces.tuple.Tuple:
return [
list(element) for element in itertools.product(
*[get_space_list(sub_space) for sub_space in space.spaces]
)
]
# -------------------------------- #