Tic Tac Toe 最小最大算法 Python。计算机算法并不健壮

Tic Tac Toe Min Max Algorithm Python. Computer Algorithm is not Robust

请随意询问清楚,因为这是一个相当长的问题

我有一款 Tic Tac Toe(X 和 O)游戏,由 2 名人类玩家玩时效果很好。

我当前的问题是尝试对其实施最小最大算法。我有 4 个功能构成了计算机的大脑,它们上面都有一个简短的描述。前两个函数主要是为了上下文。

我觉得主要问题不是分数是怎么计算的,就是在选择游戏的时候,很多游戏的分数都是一样的,它会选择第一个分数最高的,不一定是最好的选择.

这是代码。

# Find All Open Positions On the Current Board
def posis(self):
    return [x for x, coordinate  in enumerate(self.gameBoard) if ((coordinate != 'X') and (coordinate != 'O'))]
# Return All Possible Combinations Given The Current Board State           
def pos_moves(self):
    return [arr for arr in permutations(self.posis())]
# The Computer Plays Every Possible Game Based on self.pos_moves()
# At The End It Returns Each Game and Its Score

def min_max(self,player):
    moves = self.pos_moves()
    pos_boards, scores = [],[]
    for move in moves:
        board = self.gameBoard[:] # This is just the current game board
        depth = 0
        function_player = player

        while True: # This is the loop that each possible game (from pos_moves) goes through for evaluation. at the end giving it a score. 
            board[move[depth]] = function_player
            if ((self.win_checker(board)==True) or (self.board_full(board) != False)):
                if self.win_checker(board) == True:
                    if function_player == player: # Meaning the winner of that game is the computer
                        score = 6-(depth + 1)
                    elif function_player != player: # maening the winner of that game is the human player
                        score = -6+(depth + 1)
                else: # If the game is a draw
                    score = 0
                pos_boards.append(move) # Adding the board to pos_boards
                scores.append(score) # Adding the score to scores with the same index as it's corresponding board from above
                break
            function_player = self.change_player(function_player)
            depth+=1       
    return (pos_boards,scores)
#I Think This Is Where The Problem Is
def comp_think(self,player):
    pos_boards = self.min_max(player)[0]
    scores = self.min_max(player)[1]
    play = pos_boards[scores.index(max(scores))] # this is a supposed to be the best path for the computer to choose. But it's not.
    print(play)
    return str(play[0]) # returning the first move in the best path for the computer to play

这是一个示例游戏:

0 | 1 | 2 
 --------- 
 3 | 4 | 5 
 --------- 
 6 | 7 | 8 


X, Chose a slot: 0 # I play top left

 X | 1 | 2 
 --------- 
 3 | 4 | 5 
 --------- 
 6 | 7 | 8 

(1, 2, 4, 3, 7, 5, 6, 8) #Computer plays top middle but I would like it to play middle middle

 X | O | 2 
 --------- 
 3 | 4 | 5 
 --------- 
 6 | 7 | 8 


X, Chose a slot: 4

 X | O | 2 
 --------- 
 3 | X | 5 
 --------- 
 6 | 7 | 8 

(2, 3, 5, 7, 8, 6)

 X | O | O 
 --------- 
 3 | X | 5 
 --------- 
 6 | 7 | 8 


X, Chose a slot: 8

 X | O | O 
 --------- 
 3 | X | 5 
 --------- 
 6 | 7 | X 

X Wins!!!

X 在这种情况下是人类玩家

每个元组都是来自 comp_think() 的变量 play。第一步是计算机下的,第二步是人类玩家。整个元组代表了计算机对自身最佳可能结果的想法。

我在这里看到的一个潜在错误是计算机选择了可能导致最佳结果的着法。例如,如果它有可能只用 3 步就获胜,它就会选择那个。然而,这假设玩家会做出某些让计算机获胜的动作,而这可能不太可能。也许,不是找到 可以 导致最高分的单一动作,而是将每个动作可能出现的所有分数加起来,最高总分就是它选择的选择?毕竟,请注意它给出的元组不太可能发生 - 玩家只是忽略了合乎逻辑的路径。

此外,您可以在确定分数时稍微降低游戏深度的权重,如果它真的很重要的话 - 为什么早点获胜对机器人有好处?赢就是赢。