循环中的变量更新错误 - Python(Q 学习)

Variable updating wrong in loop - Python (Q-learning)


for game in range(nr_of_games):
    # Initialize the player at the start position and store the current position in position

    status = -1
    # loop over steps taken by the player
    while status == -1: #the status of the game is -1, terminate if 1 (see status_list above)

        # Find out what move to make using  

        move, action = action_fcn(q_in,epsilon,wind)
        # update location, check grid,reward_list, and status_list 
        newposition[0] = position[0] + move[0]
        newposition[1] = position[1] + move[1]
        print('new loop')
        grid_state = grid[newposition[0]][newposition[1]]
        reward = reward_list[grid_state]
        status = status_list[grid_state]
        status = int(status)
        if status == 1:
            Q[position[0],position[1],action]= reward
            break #Game over 
        else: Q[position[0],position[1],action]= (1-alpha)*Q[position[0],position[1],action]+alpha*(reward+gamma*Q[newposition[0],newposition[1],action])
        position = newposition


new loop
[16 26]
[16 26]
new loop
[17 26]
[17 26]
new loop
[18 26]
[18 26]
new loop
[19 26]
[19 26]
new loop
[19 25]
[19 25]
new loop
[20 25]
[20 25]


>>> newposition = position

所以实际上,当你递增 newposition 时,你实际上也在递增 position

所以只需让 newpositionposition 不同即可。我的意思是,让他们拥有 id(newposition) != id(position),你会很好。因为目前,我猜这两个 id 是一样的,不是吗?

Why does the position and newposition give the same output and update together in the next loop?

因为它们是同一个对象。我不是(只)说它们相等,我是说 newpositionposition,即你目前有 (newposition is position) is True.

只需独立于 position 定义 newposition。例如:

# [...]
for game in range(nr_of_games):
    # Initialize the player at the start position and store the current position in position
    position    = np.array([0,19])
    newposition = np.empty((2,))
    # [...]

此外,您可能有充分的理由这样做,但请记住,如果 moveposition 具有相同的形状并传达“相同的信息”,您也可以这样做

# [...]
    # [...]
        # [...]
        # newposition[0] = position[0] + move[0]
        # newposition[1] = position[1] + move[1]
        newposition = position + move
        # [...]

并删除 newposition = np.empty((2,)).

那是因为你试图用 = 运算符将一个列表复制到另一个列表;与列表一起使用时,它将存储在右变量中的指针分配给左变量,因此物理上指向相同的内存单元。

要真正复制列表,请使用 list.copy() 方法。