OpenAI Gym：在一个动作中遍历所有可能的动作 space

Question

我想构建一种蛮力方法，在选择最佳动作之前测试 Gym 动作中的所有动作 space。是否有任何简单、直接的方法来获取所有可能的操作？

具体来说，我的行动space是

import gym

action_space = gym.spaces.MultiDiscrete([5 for _ in range(4)])

我知道我可以使用 action_space.sample() 对随机动作进行采样，并检查动作是否包含在动作 space 中，但我想生成一个包含所有可能动作的列表 [= =21=].

有没有比一堆 for 循环更优雅（和更高效）的东西？ for 循环的问题是我希望它可以处理任何大小的操作 space，因此我不能硬编码 4 个 for 循环来遍历不同的操作。

Answer 1

健身房环境中的动作通常仅由整数表示，这意味着如果您获得可能动作的总数，则可以创建所有可能动作的数组。

在健身房环境中获得可能动作总数的方法取决于动作类型space，对于你的情况，它是一个多离散动作space，因此属性 nvec可以像上面提到的那样使用 by @Valentin Macé -:

>> print(env.action_space.nvec)
array([5, 5, 5, 5], dtype=int64)

请注意，属性 nvec 代表 n 个向量，因为它的输出是一个多维向量。另请注意，该属性是一个 numpy 数组。

现在我们有了数组，可以将其转换为动作列表列表，假设自 action_space.sample 函数 returns 一个随机函数的 numpy 数组来自MultiDiscrete action_space 即 -:

>> env.action_space.sample() # This does not return a single action but 4 actions for your case since you have a multi discrete action space of length 4.
array([2, 2, 0, 1], dtype=int64)

因此，为了将数组转换为每个维度中的可能操作列表列表，我们可以使用列表推导式 -:

>> [list(range(1, (k + 1))) for k in action_space.nvec]
[[1, 2, 3, 4, 5], [1, 2, 3, 4, 5], [1, 2, 3, 4, 5], [1, 2, 3, 4, 5]]

请注意，这可以扩展到任意数量的维度，并且在性能方面也非常有效。

现在你可以只使用两个循环遍历每个维度中可能的操作 -:

possible_actions = [list(range(1, (k + 1))) for k in action_space.nvec]
for action_dim in possible_actions :
    for action in action_dim :
        # Find best action.....
        pass

有关相同内容的更多信息，我希望您也访问 github 上的 this 主题，讨论了一个有点类似的问题，以防您发现相同的内容有用。

编辑：因此，根据您 @CGFoX 的评论，我假设您想要它，以便可以将所有可能的动作组合向量生成为任意数量维度的列表，有点像 -:

>> get_actions()
[[1, 1, 1, 1], [1, 1, 1, 2] ....] # For all possible combinations.

同样可以使用递归实现，并且只需要两个循环，这也可以扩展到所提供的任意多维度。

def flatten(actions) :
    # This function flattens any actions passed somewhat like so -:
    # INPUT -: [[1, 2, 3], 4, 5]
    # OUTPUT -: [1, 2, 3, 4, 5]
    
    new_actions = [] # Initializing the new flattened list of actions.
    for action in actions :
        # Loop through the actions
        if type(action) == list :
            # If any actions is a pair of actions i.e. a list e.g. [1, 1] then
            # add it's elements to the new_actions list.
            new_actions += action
        elif type(action) == int :
            # If the action is an integer then append it directly to the new_actions
            # list.
            new_actions.append(action)
    
    # Returns the new_actions list generated.
    return new_actions

def get_actions(possible_actions) :
    # This functions recieves as input the possibilities of actions for every dimension
    # and returns all possible dimensional combinations for the same.
    # Like so -:
    # INPUT-: [[1, 2, 3, 4], [1, 2, 3, 4]] # Example for 2 dimensions but can be scaled for any.
    # OUTPUT-: [[1, 1], [1, 2], [1, 3] ... [4, 1] ... [4, 4]]
    if len(possible_actions) == 1 :
        # If there is only one possible list of actions then it itself is the
        # list containing all possible combinations and thus is returned.
        return possible_actions
    pairs = [] # Initializing a list to contain all pairs of actions generated.
    for action in possible_actions[0] :
        # Now we loop over the first set of possibilities of actions i.e. index 0
        # and we make pairs of it with the second set i.e. index 1, appending each pair
        # to the pairs list.
        # NOTE: Incase the function is recursively called the first set of possibilities
        # of actions may contain vectors and thus the newly formed pair has to be flattened.
        # i.e. If a pair has already been made in previous generation like so -:
        # [[[1, 1], [2, 2], [3, 3] ... ], [1, 2, 3, 4]]
        # Then the pair formed will be this -: [[[1, 1], 1], [[1, 1], 2] ... ]
        # But we want them to be flattened like so -: [[1, 1, 1], [1, 1, 2] ... ]
        for action2 in possible_actions[1] :
            pairs.append(flatten([action, action2]))
    
    # Now we create a new list of all possible set of actions by combining the 
    # newly generated pairs and the sets of possibilities of actions that have not
    # been paired i.e. sets other than the first and the second.
    # NOTE: When we made pairs we did so only for the first two indexes and not for
    # all thus to do so we make a new list with the sets that remained unpaired
    # and the paired set. i.e.
    # BEFORE PAIRING -: [[1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4]]
    # AFTER PAIRING -: [[[1, 1], [1, 2] ... ], [1, 2, 3, 4]] # Notice how the third set
    # i.e. the index 2 is still unpaired and first two sets have been paired.
    new_possible_actions = [pairs] + possible_actions[2 : ]
    # Now we recurse the function and call it within itself to make pairs for the
    # left out sets, Note that since the first two sets were combined to form a paired
    # first set now this set will be paired with the third set.
    # This recursion will keep happening until all the sets have been paired to form
    # a single set with all possible combinations.
    possible_action_vectors = get_actions(new_possible_actions)
    # Finally the result of the recursion is returned.
    # NOTE: Only the first index is returned since now the first index contains the
    # paired set of actions.
    return possible_action_vectors[0]

一旦我们定义了这个函数，它就可以与我们之前生成的动作可能性集一起使用，以获得所有可能的组合，就像这样 -:

possible_actions = [list(range(1, (k + 1))) for k in action_space.nvec]
print(get_actions(possible_actions))
>> [[1, 1, 1, 1], [1, 1, 1, 2], [1, 1, 1, 3], [1, 1, 1, 4], [1, 1, 1, 5], `[1, 1, 2, 1], [1, 1, 2, 2], [1, 1, 2, 3], [1, 1, 2, 4], [1, 1, 2, 5], [1, 1, 3, 1], [1, 1, 3, 2], [1, 1, 3, 3], [1, 1, 3, 4], [1, 1, 3, 5], [1, 1, 4, 1], [1, 1, 4, 2], [1, 1, 4, 3], [1, 1, 4, 4], [1, 1, 4, 5], [1, 1, 5, 1], [1, 1, 5, 2], [1, 1, 5, 3], [1, 1, 5, 4], [1, 1, 5, 5], [1, 2, 1, 1], [1, 2, 1, 2], [1, 2, 1, 3], [1, 2, 1, 4], [1, 2, 1, 5], [1, 2, 2, 1], [1, 2, 2, 2], [1, 2, 2, 3], [1, 2, 2, 4], [1, 2, 2, 5], [1, 2, 3, 1], [1, 2, 3, 2], [1, 2, 3, 3], [1, 2, 3, 4], [1, 2, 3, 5], [1, 2, 4, 1], [1, 2, 4, 2], [1, 2, 4, 3], [1, 2, 4, 4], [1, 2, 4, 5], [1, 2, 5, 1], [1, 2, 5, 2], [1, 2, 5, 3], [1, 2, 5, 4], [1, 2, 5, 5], [1, 3, 1, 1], [1, 3, 1, 2], [1, 3, 1, 3], [1, 3, 1, 4], [1, 3, 1, 5], [1, 3, 2, 1], [1, 3, 2, 2], [1, 3, 2, 3], [1, 3, 2, 4], [1, 3, 2, 5], [1, 3, 3, 1], [1, 3, 3, 2], [1, 3, 3, 3], [1, 3, 3, 4], [1, 3, 3, 5], [1, 3, 4, 1], [1, 3, 4, 2], [1, 3, 4, 3], [1, 3, 4, 4], [1, 3, 4, 5], [1, 3, 5, 1], [1, 3, 5, 2], [1, 3, 5, 3], [1, 3, 5, 4], [1, 3, 5, 5], [1, 4, 1, 1], [1, 4, 1, 2], [1, 4, 1, 3], [1, 4, 1, 4], [1, 4, 1, 5], [1, 4, 2, 1], [1, 4, 2, 2], [1, 4, 2, 3], [1, 4, 2, 4], [1, 4, 2, 5], [1, 4, 3, 1], [1, 4, 3, 2], [1, 4, 3, 3], [1, 4, 3, 4], [1, 4, 3, 5], [1, 4, 4, 1], [1, 4, 4, 2], [1, 4, 4, 3], [1, 4, 4, 4], [1, 4, 4, 5], [1, 4, 5, 1], [1, 4, 5, 2], [1, 4, 5, 3], [1, 4, 5, 4], [1, 4, 5, 5], [1, 5, 1, 1], [1, 5, 1, 2], [1, 5, 1, 3], [1, 5, 1, 4], [1, 5, 1, 5], [1, 5, 2, 1], [1, 5, 2, 2], [1, 5, 2, 3], [1, 5, 2, 4], [1, 5, 2, 5], [1, 5, 3, 1], [1, 5, 3, 2], [1, 5, 3, 3], [1, 5, 3, 4], [1, 5, 3, 5], [1, 5, 4, 1], [1, 5, 4, 2], [1, 5, 4, 3], [1, 5, 4, 4], [1, 5, 4, 5], [1, 5, 5, 1], [1, 5, 5, 2], [1, 5, 5, 3], [1, 5, 5, 4], [1, 5, 5, 5], [2, 1, 1, 1], [2, 1, 1, 2], [2, 1, 1, 3], [2, 1, 1, 4], [2, 1, 1, 5], [2, 1, 2, 1], [2, 1, 2, 2], [2, 1, 2, 3], [2, 1, 2, 4], [2, 1, 2, 5], [2, 1, 3, 1], [2, 1, 3, 2], [2, 1, 3, 3], [2, 1, 3, 4], [2, 1, 3, 5], [2, 1, 4, 1], [2, 1, 4, 2], [2, 1, 4, 3], [2, 1, 4, 4], [2, 1, 4, 5], [2, 1, 5, 1], [2, 1, 5, 2], [2, 1, 5, 3], [2, 1, 5, 4], [2, 1, 5, 5], [2, 2, 1, 1], [2, 2, 1, 2], [2, 2, 1, 3], [2, 2, 1, 4], [2, 2, 1, 5], [2, 2, 2, 1], [2, 2, 2, 2], [2, 2, 2, 3], [2, 2, 2, 4], [2, 2, 2, 5], [2, 2, 3, 1], [2, 2, 3, 2], [2, 2, 3, 3], [2, 2, 3, 4], [2, 2, 3, 5], [2, 2, 4, 1], [2, 2, 4, 2], [2, 2, 4, 3], [2, 2, 4, 4], [2, 2, 4, 5], [2, 2, 5, 1], [2, 2, 5, 2], [2, 2, 5, 3], [2, 2, 5, 4], [2, 2, 5, 5], [2, 3, 1, 1], [2, 3, 1, 2], [2, 3, 1, 3], [2, 3, 1, 4], [2, 3, 1, 5], [2, 3, 2, 1], [2, 3, 2, 2], [2, 3, 2, 3], [2, 3, 2, 4], [2, 3, 2, 5], [2, 3, 3, 1], [2, 3, 3, 2], [2, 3, 3, 3], [2, 3, 3, 4], [2, 3, 3, 5], [2, 3, 4, 1], [2, 3, 4, 2], [2, 3, 4, 3], [2, 3, 4, 4], [2, 3, 4, 5], [2, 3, 5, 1], [2, 3, 5, 2], [2, 3, 5, 3], [2, 3, 5, 4], [2, 3, 5, 5], [2, 4, 1, 1], [2, 4, 1, 2], [2, 4, 1, 3], [2, 4, 1, 4], [2, 4, 1, 5], [2, 4, 2, 1], [2, 4, 2, 2], [2, 4, 2, 3], [2, 4, 2, 4], [2, 4, 2, 5], [2, 4, 3, 1], [2, 4, 3, 2], [2, 4, 3, 3], [2, 4, 3, 4], [2, 4, 3, 5], [2, 4, 4, 1], [2, 4, 4, 2], [2, 4, 4, 3], [2, 4, 4, 4], [2, 4, 4, 5], [2, 4, 5, 1], [2, 4, 5, 2], [2, 4, 5, 3], [2, 4, 5, 4], [2, 4, 5, 5], [2, 5, 1, 1], [2, 5, 1, 2], [2, 5, 1, 3], [2, 5, 1, 4], [2, 5, 1, 5], [2, 5, 2, 1], [2, 5, 2, 2], [2, 5, 2, 3], [2, 5, 2, 4], [2, 5, 2, 5], [2, 5, 3, 1], [2, 5, 3, 2], [2, 5, 3, 3], [2, 5, 3, 4], [2, 5, 3, 5], [2, 5, 4, 1], [2, 5, 4, 2], [2, 5, 4, 3], [2, 5, 4, 4], [2, 5, 4, 5], [2, 5, 5, 1], [2, 5, 5, 2], [2, 5, 5, 3], [2, 5, 5, 4], [2, 5, 5, 5], [3, 1, 1, 1], [3, 1, 1, 2], [3, 1, 1, 3], [3, 1, 1, 4], [3, 1, 1, 5], [3, 1, 2, 1], [3, 1, 2, 2], [3, 1, 2, 3], [3, 1, 2, 4], [3, 1, 2, 5], [3, 1, 3, 1], [3, 1, 3, 2], [3, 1, 3, 3], [3, 1, 3, 4], [3, 1, 3, 5], [3, 1, 4, 1], [3, 1, 4, 2], [3, 1, 4, 3], [3, 1, 4, 4], [3, 1, 4, 5], [3, 1, 5, 1], [3, 1, 5, 2], [3, 1, 5, 3], [3, 1, 5, 4], [3, 1, 5, 5], [3, 2, 1, 1], [3, 2, 1, 2], [3, 2, 1, 3], [3, 2, 1, 4], [3, 2, 1, 5], [3, 2, 2, 1], [3, 2, 2, 2], [3, 2, 2, 3], [3, 2, 2, 4], [3, 2, 2, 5], [3, 2, 3, 1], [3, 2, 3, 2], [3, 2, 3, 3], [3, 2, 3, 4], [3, 2, 3, 5], [3, 2, 4, 1], [3, 2, 4, 2], [3, 2, 4, 3], [3, 2, 4, 4], [3, 2, 4, 5], [3, 2, 5, 1], [3, 2, 5, 2], [3, 2, 5, 3], [3, 2, 5, 4], [3, 2, 5, 5], [3, 3, 1, 1], [3, 3, 1, 2], [3, 3, 1, 3], [3, 3, 1, 4], [3, 3, 1, 5], [3, 3, 2, 1], [3, 3, 2, 2], [3, 3, 2, 3], [3, 3, 2, 4], [3, 3, 2, 5], [3, 3, 3, 1], [3, 3, 3, 2], [3, 3, 3, 3], [3, 3, 3, 4], [3, 3, 3, 5], [3, 3, 4, 1], [3, 3, 4, 2], [3, 3, 4, 3], [3, 3, 4, 4], [3, 3, 4, 5], [3, 3, 5, 1], [3, 3, 5, 2], [3, 3, 5, 3], [3, 3, 5, 4], [3, 3, 5, 5], [3, 4, 1, 1], [3, 4, 1, 2], [3, 4, 1, 3], [3, 4, 1, 4], [3, 4, 1, 5], [3, 4, 2, 1], [3, 4, 2, 2], [3, 4, 2, 3], [3, 4, 2, 4], [3, 4, 2, 5], [3, 4, 3, 1], [3, 4, 3, 2], [3, 4, 3, 3], [3, 4, 3, 4], [3, 4, 3, 5], [3, 4, 4, 1], [3, 4, 4, 2], [3, 4, 4, 3], [3, 4, 4, 4], [3, 4, 4, 5], [3, 4, 5, 1], [3, 4, 5, 2], [3, 4, 5, 3], [3, 4, 5, 4], [3, 4, 5, 5], [3, 5, 1, 1], [3, 5, 1, 2], [3, 5, 1, 3], [3, 5, 1, 4], [3, 5, 1, 5], [3, 5, 2, 1], [3, 5, 2, 2], [3, 5, 2, 3], [3, 5, 2, 4], [3, 5, 2, 5], [3, 5, 3, 1], [3, 5, 3, 2], [3, 5, 3, 3], [3, 5, 3, 4], [3, 5, 3, 5], [3, 5, 4, 1], [3, 5, 4, 2], [3, 5, 4, 3], [3, 5, 4, 4], [3, 5, 4, 5], [3, 5, 5, 1], [3, 5, 5, 2], [3, 5, 5, 3], [3, 5, 5, 4], [3, 5, 5, 5], [4, 1, 1, 1], [4, 1, 1, 2], [4, 1, 1, 3], [4, 1, 1, 4], [4, 1, 1, 5], [4, 1, 2, 1], [4, 1, 2, 2], [4, 1, 2, 3], [4, 1, 2, 4], [4, 1, 2, 5], [4, 1, 3, 1], [4, 1, 3, 2], [4, 1, 3, 3], [4, 1, 3, 4], [4, 1, 3, 5], [4, 1, 4, 1], [4, 1, 4, 2], [4, 1, 4, 3], [4, 1, 4, 4], [4, 1, 4, 5], [4, 1, 5, 1], [4, 1, 5, 2], [4, 1, 5, 3], [4, 1, 5, 4], [4, 1, 5, 5], [4, 2, 1, 1], [4, 2, 1, 2], [4, 2, 1, 3], [4, 2, 1, 4], [4, 2, 1, 5], [4, 2, 2, 1], [4, 2, 2, 2], [4, 2, 2, 3], [4, 2, 2, 4], [4, 2, 2, 5], [4, 2, 3, 1], [4, 2, 3, 2], [4, 2, 3, 3], [4, 2, 3, 4], [4, 2, 3, 5], [4, 2, 4, 1], [4, 2, 4, 2], [4, 2, 4, 3], [4, 2, 4, 4], [4, 2, 4, 5], [4, 2, 5, 1], [4, 2, 5, 2], [4, 2, 5, 3], [4, 2, 5, 4], [4, 2, 5, 5], [4, 3, 1, 1], [4, 3, 1, 2], [4, 3, 1, 3], [4, 3, 1, 4], [4, 3, 1, 5], [4, 3, 2, 1], [4, 3, 2, 2], [4, 3, 2, 3], [4, 3, 2, 4], [4, 3, 2, 5], [4, 3, 3, 1], [4, 3, 3, 2], [4, 3, 3, 3], [4, 3, 3, 4], [4, 3, 3, 5], [4, 3, 4, 1], [4, 3, 4, 2], [4, 3, 4, 3], [4, 3, 4, 4], [4, 3, 4, 5], [4, 3, 5, 1], [4, 3, 5, 2], [4, 3, 5, 3], [4, 3, 5, 4], [4, 3, 5, 5], [4, 4, 1, 1], [4, 4, 1, 2], [4, 4, 1, 3], [4, 4, 1, 4], [4, 4, 1, 5], [4, 4, 2, 1], [4, 4, 2, 2], [4, 4, 2, 3], [4, 4, 2, 4], [4, 4, 2, 5], [4, 4, 3, 1], [4, 4, 3, 2], [4, 4, 3, 3], [4, 4, 3, 4], [4, 4, 3, 5], [4, 4, 4, 1], [4, 4, 4, 2], [4, 4, 4, 3], [4, 4, 4, 4], [4, 4, 4, 5], [4, 4, 5, 1], [4, 4, 5, 2], [4, 4, 5, 3], [4, 4, 5, 4], [4, 4, 5, 5], [4, 5, 1, 1], [4, 5, 1, 2], [4, 5, 1, 3], [4, 5, 1, 4], [4, 5, 1, 5], [4, 5, 2, 1], [4, 5, 2, 2], [4, 5, 2, 3], [4, 5, 2, 4], [4, 5, 2, 5], [4, 5, 3, 1], [4, 5, 3, 2], [4, 5, 3, 3], [4, 5, 3, 4], [4, 5, 3, 5], [4, 5, 4, 1], [4, 5, 4, 2], [4, 5, 4, 3], [4, 5, 4, 4], [4, 5, 4, 5], [4, 5, 5, 1], [4, 5, 5, 2], [4, 5, 5, 3], [4, 5, 5, 4], [4, 5, 5, 5], [5, 1, 1, 1], [5, 1, 1, 2], [5, 1, 1, 3], [5, 1, 1, 4], [5, 1, 1, 5], [5, 1, 2, 1], [5, 1, 2, 2], [5, 1, 2, 3], [5, 1, 2, 4], [5, 1, 2, 5], [5, 1, 3, 1], [5, 1, 3, 2], [5, 1, 3, 3], [5, 1, 3, 4], [5, 1, 3, 5], [5, 1, 4, 1], [5, 1, 4, 2], [5, 1, 4, 3], [5, 1, 4, 4], [5, 1, 4, 5], [5, 1, 5, 1], [5, 1, 5, 2], [5, 1, 5, 3], [5, 1, 5, 4], [5, 1, 5, 5], [5, 2, 1, 1], [5, 2, 1, 2], [5, 2, 1, 3], [5, 2, 1, 4], [5, 2, 1, 5], [5, 2, 2, 1], [5, 2, 2, 2], [5, 2, 2, 3], [5, 2, 2, 4], [5, 2, 2, 5], [5, 2, 3, 1], [5, 2, 3, 2], [5, 2, 3, 3], [5, 2, 3, 4], [5, 2, 3, 5], [5, 2, 4, 1], [5, 2, 4, 2], [5, 2, 4, 3], [5, 2, 4, 4], [5, 2, 4, 5], [5, 2, 5, 1], [5, 2, 5, 2], [5, 2, 5, 3], [5, 2, 5, 4], [5, 2, 5, 5], [5, 3, 1, 1], [5, 3, 1, 2], [5, 3, 1, 3], [5, 3, 1, 4], [5, 3, 1, 5], [5, 3, 2, 1], [5, 3, 2, 2], [5, 3, 2, 3], [5, 3, 2, 4], [5, 3, 2, 5], [5, 3, 3, 1], [5, 3, 3, 2], [5, 3, 3, 3], [5, 3, 3, 4], [5, 3, 3, 5], [5, 3, 4, 1], [5, 3, 4, 2], [5, 3, 4, 3], [5, 3, 4, 4], [5, 3, 4, 5], [5, 3, 5, 1], [5, 3, 5, 2], [5, 3, 5, 3], [5, 3, 5, 4], [5, 3, 5, 5], [5, 4, 1, 1], [5, 4, 1, 2], [5, 4, 1, 3], [5, 4, 1, 4], [5, 4, 1, 5], [5, 4, 2, 1], [5, 4, 2, 2], [5, 4, 2, 3], [5, 4, 2, 4], [5, 4, 2, 5], [5, 4, 3, 1], [5, 4, 3, 2], [5, 4, 3, 3], [5, 4, 3, 4], [5, 4, 3, 5], [5, 4, 4, 1], [5, 4, 4, 2], [5, 4, 4, 3], [5, 4, 4, 4], [5, 4, 4, 5], [5, 4, 5, 1], [5, 4, 5, 2], [5, 4, 5, 3], [5, 4, 5, 4], [5, 4, 5, 5], [5, 5, 1, 1], [5, 5, 1, 2], [5, 5, 1, 3], [5, 5, 1, 4], [5, 5, 1, 5], [5, 5, 2, 1], [5, 5, 2, 2], [5, 5, 2, 3], [5, 5, 2, 4], [5, 5, 2, 5], [5, 5, 3, 1], [5, 5, 3, 2], [5, 5, 3, 3], [5, 5, 3, 4], [5, 5, 3, 5], [5, 5, 4, 1], [5, 5, 4, 2], [5, 5, 4, 3], [5, 5, 4, 4], [5, 5, 4, 5], [5, 5, 5, 1], [5, 5, 5, 2], [5, 5, 5, 3], [5, 5, 5, 4], [5, 5, 5, 5]]

EDIT-2：我修复了一些以前返回嵌套列表的代码，现在返回的列表是成对的列表，而不是嵌套在另一个列表中。

EDIT-3-：修正了我的拼写错误。

Answer 2

也可以使用下面的函数，分别列出观察或动作中的所有状态或动作 space。

def get_space_list(space):

    """
    Converts gym `space`, constructed from `types`, to list `space_list`
    """

    # -------------------------------- #

    types = [
        gym.spaces.multi_binary.MultiBinary,
        gym.spaces.discrete.Discrete,
        gym.spaces.multi_discrete.MultiDiscrete,
        gym.spaces.dict.Dict,
        gym.spaces.tuple.Tuple,
    ]

    if type(space) not in types:
        raise ValueError(f'input space {space} is not constructed from spaces of types:' + '\n' + str(types))

    # -------------------------------- #

    if type(space) is gym.spaces.multi_binary.MultiBinary:
        return [
            np.reshape(np.array(element), space.n)
            for element in itertools.product(
                *[range(2)] * np.prod(space.n)
            )
        ]

    if type(space) is gym.spaces.discrete.Discrete:
        return list(range(space.n))

    if type(space) is gym.spaces.multi_discrete.MultiDiscrete:
        return [
            np.array(element) for element in itertools.product(
                *[range(n) for n in space.nvec]
            )
        ]

    if type(space) is gym.spaces.dict.Dict:

        keys = space.spaces.keys()
        
        values_list = itertools.product(
            *[get_space_list(sub_space) for sub_space in space.spaces.values()]
        )

        return [
            {key: value for key, value in zip(keys, values)}
            for values in values_list
        ]

        return space_list

    if type(space) is gym.spaces.tuple.Tuple:
        return [
            list(element) for element in itertools.product(
                *[get_space_list(sub_space) for sub_space in space.spaces]
            )
        ]

    # -------------------------------- #

OpenAI Gym：在一个动作中遍历所有可能的动作 space

OpenAI Gym: Walk through all possible actions in an action space

python

for-loop

openai-gym