Python Negamax 算法
Python Negamax Algorithm
我有尽可能简单的 negamax 算法,用于评估 Tic Tac Toe 中的位置。游戏状态在 numpy 中存储为一个数组,X 的棋子用 1 表示,O 的棋子用 4 表示。
我刚才正在测试这个,发现:
a = np.zeros(9).reshape(3,3)
negaMax(a, 6, 1) # Returned zero as it should
negaMax(a, 7, 1) # Returns 100
这意味着我的算法认为它已经找到了让 X 在 Tic Tac Toe 游戏中赢得七层的方法,这对于正常的游戏来说显然是不可能的。我无法弄清楚如何让它打印出它找到的最佳动作,所以我在调试它时遇到了真正的麻烦。我做错了什么?
def winCheck(state):
"""Takes a position, and returns the outcome of that game"""
# Sums which correspond to a line across a column
winNums = list(state.sum(axis=0))
# Sums which correspond to a line across a row
winNums.extend(list(state.sum(axis=1)))
# Sums which correspond to a line across the main diagonal
winNums.append(state.trace())
# Sums which correspond to a line across the off diagonal
winNums.append(np.flipud(state).trace())
if Square.m in winNums:
return 'X'
elif (Square.m**2 + Square.m) in winNums:
return 'O'
elif np.count_nonzero(state) == Square.m**2:
return 'D'
else:
return None
def moveFind(state):
"""Takes a position as an nparray and determines the legal moves"""
moveChoices = []
# Iterate over state, to determine which squares are empty
it = np.nditer(state, flags=['multi_index'])
while not it.finished:
if it[0] == 0:
moveChoices.append(it.multi_index)
it.iternext()
return moveChoices
def moveSim(state, move, player):
"""Create the state of the player having moved without interfering with the board"""
simState = state.copy()
if player == 1:
simState[move] = 1
else:
simState[move] = gamecfg.n + 1
return simState
def positionScore(state):
"""The game is either won or lost"""
if winCheck(state) == 'X':
return 100
elif winCheck(state) == 'O':
return -100
else:
return 0
def negaMax(state, depth, colour):
"""Recursively find the best move via a negamax search"""
if depth == 0:
return positionScore(state) * colour
highScore = -100
moveList = moveFind(state)
for move in moveList:
score = -negaMax(moveSim(state, move, colour), depth -1, colour * -1)
highScore = max(score, highScore)
return highScore
当一行 3 个符号出现时,您的代码不会认为游戏停止。
这意味着它正在玩井字游戏的变体,其中即使在 O 已经连成 3 之后,如果他连成 3,X 也会获胜。
对于这个变体,程序正确地发现 X 有可能总是赢!
(我在我制作的国际象棋程序中遇到了同样的情况,如果它能稍后将死,计算机很乐意牺牲它的国王......)
我有尽可能简单的 negamax 算法,用于评估 Tic Tac Toe 中的位置。游戏状态在 numpy 中存储为一个数组,X 的棋子用 1 表示,O 的棋子用 4 表示。
我刚才正在测试这个,发现:
a = np.zeros(9).reshape(3,3)
negaMax(a, 6, 1) # Returned zero as it should
negaMax(a, 7, 1) # Returns 100
这意味着我的算法认为它已经找到了让 X 在 Tic Tac Toe 游戏中赢得七层的方法,这对于正常的游戏来说显然是不可能的。我无法弄清楚如何让它打印出它找到的最佳动作,所以我在调试它时遇到了真正的麻烦。我做错了什么?
def winCheck(state):
"""Takes a position, and returns the outcome of that game"""
# Sums which correspond to a line across a column
winNums = list(state.sum(axis=0))
# Sums which correspond to a line across a row
winNums.extend(list(state.sum(axis=1)))
# Sums which correspond to a line across the main diagonal
winNums.append(state.trace())
# Sums which correspond to a line across the off diagonal
winNums.append(np.flipud(state).trace())
if Square.m in winNums:
return 'X'
elif (Square.m**2 + Square.m) in winNums:
return 'O'
elif np.count_nonzero(state) == Square.m**2:
return 'D'
else:
return None
def moveFind(state):
"""Takes a position as an nparray and determines the legal moves"""
moveChoices = []
# Iterate over state, to determine which squares are empty
it = np.nditer(state, flags=['multi_index'])
while not it.finished:
if it[0] == 0:
moveChoices.append(it.multi_index)
it.iternext()
return moveChoices
def moveSim(state, move, player):
"""Create the state of the player having moved without interfering with the board"""
simState = state.copy()
if player == 1:
simState[move] = 1
else:
simState[move] = gamecfg.n + 1
return simState
def positionScore(state):
"""The game is either won or lost"""
if winCheck(state) == 'X':
return 100
elif winCheck(state) == 'O':
return -100
else:
return 0
def negaMax(state, depth, colour):
"""Recursively find the best move via a negamax search"""
if depth == 0:
return positionScore(state) * colour
highScore = -100
moveList = moveFind(state)
for move in moveList:
score = -negaMax(moveSim(state, move, colour), depth -1, colour * -1)
highScore = max(score, highScore)
return highScore
当一行 3 个符号出现时,您的代码不会认为游戏停止。
这意味着它正在玩井字游戏的变体,其中即使在 O 已经连成 3 之后,如果他连成 3,X 也会获胜。
对于这个变体,程序正确地发现 X 有可能总是赢!
(我在我制作的国际象棋程序中遇到了同样的情况,如果它能稍后将死,计算机很乐意牺牲它的国王......)