循环中的变量更新错误 - Python(Q 学习)
Variable updating wrong in loop - Python (Q-learning)
为什么position和newposition给出相同的输出并在下一个循环中一起更新?
for game in range(nr_of_games):
# Initialize the player at the start position and store the current position in position
position=np.array([0,19])
status = -1
# loop over steps taken by the player
while status == -1: #the status of the game is -1, terminate if 1 (see status_list above)
# Find out what move to make using
q_in=Q[position[0],position[1]]
move, action = action_fcn(q_in,epsilon,wind)
# update location, check grid,reward_list, and status_list
newposition[0] = position[0] + move[0]
newposition[1] = position[1] + move[1]
print('new loop')
print(newposition)
print(position)
grid_state = grid[newposition[0]][newposition[1]]
reward = reward_list[grid_state]
status = status_list[grid_state]
status = int(status)
if status == 1:
Q[position[0],position[1],action]= reward
break #Game over
else: Q[position[0],position[1],action]= (1-alpha)*Q[position[0],position[1],action]+alpha*(reward+gamma*Q[newposition[0],newposition[1],action])
position = newposition
打印出来:
new loop
[16 26]
[16 26]
new loop
[17 26]
[17 26]
new loop
[18 26]
[18 26]
new loop
[19 26]
[19 26]
new loop
[19 25]
[19 25]
new loop
[20 25]
[20 25]
显然,有些地方你没有给我们看,你却给我们看
>>> newposition = position
所以实际上,当你递增 newposition
时,你实际上也在递增 position
。
所以只需让 newposition
与 position
不同即可。我的意思是,让他们拥有 id(newposition) != id(position)
,你会很好。因为目前,我猜这两个 id
是一样的,不是吗?
Why does the position and newposition give the same output and update together in the next loop?
因为它们是同一个对象。我不是(只)说它们相等,我是说 newposition
是 position
,即你目前有 (newposition is position) is True
.
只需独立于 position
定义 newposition
。例如:
# [...]
for game in range(nr_of_games):
# Initialize the player at the start position and store the current position in position
position = np.array([0,19])
newposition = np.empty((2,))
# [...]
此外,您可能有充分的理由这样做,但请记住,如果 move
和 position
具有相同的形状并传达“相同的信息”,您也可以这样做
# [...]
# [...]
# [...]
# newposition[0] = position[0] + move[0]
# newposition[1] = position[1] + move[1]
newposition = position + move
# [...]
并删除 newposition = np.empty((2,))
.
那是因为你试图用 =
运算符将一个列表复制到另一个列表;与列表一起使用时,它将存储在右变量中的指针分配给左变量,因此物理上指向相同的内存单元。
要真正复制列表,请使用 list.copy()
方法。
为什么position和newposition给出相同的输出并在下一个循环中一起更新?
for game in range(nr_of_games):
# Initialize the player at the start position and store the current position in position
position=np.array([0,19])
status = -1
# loop over steps taken by the player
while status == -1: #the status of the game is -1, terminate if 1 (see status_list above)
# Find out what move to make using
q_in=Q[position[0],position[1]]
move, action = action_fcn(q_in,epsilon,wind)
# update location, check grid,reward_list, and status_list
newposition[0] = position[0] + move[0]
newposition[1] = position[1] + move[1]
print('new loop')
print(newposition)
print(position)
grid_state = grid[newposition[0]][newposition[1]]
reward = reward_list[grid_state]
status = status_list[grid_state]
status = int(status)
if status == 1:
Q[position[0],position[1],action]= reward
break #Game over
else: Q[position[0],position[1],action]= (1-alpha)*Q[position[0],position[1],action]+alpha*(reward+gamma*Q[newposition[0],newposition[1],action])
position = newposition
打印出来:
new loop
[16 26]
[16 26]
new loop
[17 26]
[17 26]
new loop
[18 26]
[18 26]
new loop
[19 26]
[19 26]
new loop
[19 25]
[19 25]
new loop
[20 25]
[20 25]
显然,有些地方你没有给我们看,你却给我们看
>>> newposition = position
所以实际上,当你递增 newposition
时,你实际上也在递增 position
。
所以只需让 newposition
与 position
不同即可。我的意思是,让他们拥有 id(newposition) != id(position)
,你会很好。因为目前,我猜这两个 id
是一样的,不是吗?
Why does the position and newposition give the same output and update together in the next loop?
因为它们是同一个对象。我不是(只)说它们相等,我是说 newposition
是 position
,即你目前有 (newposition is position) is True
.
只需独立于 position
定义 newposition
。例如:
# [...]
for game in range(nr_of_games):
# Initialize the player at the start position and store the current position in position
position = np.array([0,19])
newposition = np.empty((2,))
# [...]
此外,您可能有充分的理由这样做,但请记住,如果 move
和 position
具有相同的形状并传达“相同的信息”,您也可以这样做
# [...]
# [...]
# [...]
# newposition[0] = position[0] + move[0]
# newposition[1] = position[1] + move[1]
newposition = position + move
# [...]
并删除 newposition = np.empty((2,))
.
那是因为你试图用 =
运算符将一个列表复制到另一个列表;与列表一起使用时,它将存储在右变量中的指针分配给左变量,因此物理上指向相同的内存单元。
要真正复制列表,请使用 list.copy()
方法。