Minimax 算法只返回特定的一组值

Question

我在 Python 中为基本的井字棋 AI 实现了一个极小极大算法，如下所示：

def minimax(currentBoard, player):
    if isGameOver(currentBoard):
        score = evaluate(currentBoard)
        return score
    for cell in getEmptySpots(currentBoard):
        x = cell[0]
        y = cell[1]
        currentBoard[x][y] = player
        bestScore = -1000000
        score = minPlay(currentBoard, -player)
        currentBoard[x][y] = 0
        if score > bestScore:
            bestScore = score
            bestMove = cell
            print('Best move:')
            print(bestMove)
            print('\n')
        return bestMove

def minPlay(currentBoard, player):
    if isGameOver(currentBoard):
        score = evaluate(currentBoard)
        return score
    for cell in getEmptySpots(currentBoard):
        x = cell[0]
        y = cell[1]
        currentBoard[x][y] = player
        bestScore = 1000000
        score = maxPlay(currentBoard, -player)
        currentBoard[x][y] = 0
        if score < bestScore:
            bestScore = score
        return bestScore

def maxPlay(currentBoard, player):
    if isGameOver(currentBoard):
        score = evaluate(currentBoard)
        return score
    for cell in getEmptySpots(currentBoard):
        x = cell[0]
        y = cell[1]
        currentBoard[x][y] = player
        bestScore = -1000000
        score = minPlay(currentBoard, -player)
        currentBoard[x][y] = 0
        if score > bestScore:
            bestScore = score
        return bestScore

其他支持功能是不言自明的。但是，此脚本无法正常运行。例如，它似乎总是从选择 [0,0] 开始，然后进行一组相对恒定的移动，即使有更好的移动可用。此外，给定以下状态（以及一般情况下可用的获胜动作的状态）：

其中 1 代表人，-1 代表计算机，计算机选择 [2][1] 作为最佳着法，而不是 [2][2] 或 [1][2] 两者都会获胜。

我已经用不同的语言回答了很多与 minimax 实现相关的问题，据我所知我的代码在逻辑上是有意义的。因此，我不确定问题出在哪里。我的完整代码可以在 here.

中找到

Answer 1

您的空单元格循环中存在逻辑错误！您必须在循环之前初始化 bestScore，在循环之后初始化 return bestScore。否则 minimax、minPlay、maxPlay 将始终选择第一个空单元格。

这是 minPlay 的修复（minimax 和 maxPlay 可以类似地修复）：

def minPlay(currentBoard, player):
    if isGameOver(currentBoard):
        score = evaluate(currentBoard)
        return score
    # initialize the "best score" before the loop
    bestScore = 1000000
    for cell in getEmptySpots(currentBoard):
        x = cell[0]
        y = cell[1]
        currentBoard[x][y] = player
        score = maxPlay(currentBoard, -player)
        currentBoard[x][y] = 0
        # update the "best score"
        if score < bestScore:
            bestScore = score
    # return the "best score" after inspecting *all* empty cells
    return bestScore

Answer 2

有两个问题。首先是@Flopp 的回答中的逻辑问题。第二个是 isGameOver 不 return 当没有更多移动时为真。 0 的分数得到 return 作为初始最高或最低分数。

此处：

def minPlay(currentBoard, player):
    if isGameOver(currentBoard):
        score = evaluate(currentBoard)
        return score

相关的（固定的）行在这里（它并不漂亮，只是演示它会起作用）：

def isGameOver(currentBoard):
    return checkGameOver(currentBoard, HUMAN) or checkGameOver(currentBoard, COMPUTER) or getEmptySpots(currentBoard) == []

对于 minimax，确保也有一个初始的 bestMove 可能是个好主意。

def minimax(currentBoard, player):
    if isGameOver(currentBoard):
        score = evaluate(currentBoard)
        return score
    allMoves = getEmptySpots(currentBoard)
    bestMove = allMoves[0]
    for cell in allMoves:

Minimax 算法只返回特定的一组值

Minimax algorithm only returning specific set of values

python

algorithm

minimax