为什么 Alpha/Beta 修剪对我的 MiniMax 算法没有影响?

Why is Alpha/Beta pruning having no effect on my MiniMax algorithm?

首先,我很抱歉标题有点不正确,我只是不想让它有 30 个字那么长。 当我将它应用到我的 TicTacToe 游戏时,我实施的 alpha/beta 修剪极大地减少了评估量,请在下面亲自查看。

每对评估计数均使用与输入相同的游戏状态进行测量。

当我想对我一直在研究的玩神经网络的西洋跳棋实施修剪时,问题就出现了。这是整个事情开始的目标,我只是掀起了 TicTacToe 游戏来试验 MiniMax + Alpha/Beta 因为我以前从未处理过这些算法。

下面是使用 NN 进行的同类实验。

现在是代码(跳棋,如果您想看一下 TicTacToe 版本,请告诉我,尽管它们几乎相同)。

我将只粘贴一次这两种方法的开头,因为它们完全相同,我将显示两个签名,因为它们略有不同。

Small note to make the code more clear.

Board is the object which keeps track of pieces, available moves, which turn it is, if the game has been won/drawn etc...

Move is the object which contains all information pertinent to moves, when I make the clone as the first line of the method I simply make a clone of the given board and the constructor applies the given move to it.

private double miniMax(Board b, Move m, int depth) {

private double alphaBeta(Board b, Move m, int depth, double alpha, double beta) {

两种方法的开头:

Testboard clone = new Testboard(b, m);
    // Making a clone of the board in order to
    // avoid making changes to the original one

    if (clone.isGameOver()) {

        if (clone.getLoser() == null) 
            // It's a draw, evaluation = 0
            return 0;   

        if (clone.getLoser() == Color.BLACK)
            // White (Max) won, evaluation = 1
            return 1;

        // Black (Min) won, evaluation = -1
        return -1;  
    } 

    if (depth == 0) 
        // Reached the end of the search, returning current Evaluation of the board
        return getEvaluation(clone);

常规 MiniMax 延续:

    // If it's not game over
    if (clone.getTurn() == Color.WHITE) {

        // It's white's turn (Maxing player)
        double max = -1;
        for (Move move : clone.getMoves()) {
            // For each children node (available moves)
            // Their minimax value is calculated
            double score = miniMax(clone, move, depth-1);
            // Only the highest score is stored
            if (score > max)
                max = score;
        }
        // And is returned
        return max;
    } 

    // It's black's turn (Min player)
    double min = 1;
    for (Move move : clone.getMoves()) {
        // For each children node (available moves)
        // Their minimax value is calculated
        double score = miniMax(clone, move, depth-1);
        // Only the lowest score is stored
        if (score < min)
            min = score;
    }
    // And is returned
    return min;
}

具有 Alpha/Beta 修剪延续的 MiniMax:

    // If it's not game over
    if (clone.getTurn() == Color.WHITE) {

        // It's white's turn (Maxing player)
        for (Move move : clone.getMoves()) {

            // For each children node (available moves)
            // Their minimax value is calculated                
            double score = alphaBeta(clone, move, depth-1, alpha, beta);

            if (score > alpha)
                // If this score is greater than alpha
                // It is assigned to alpha as the new highest score
                alpha = score;
            if (alpha >= beta)
                // The cycle is interrupted early if the value of alpha equals or is greater than beta
                break;
        }
        // The alpha value is returned
        return alpha;
    } 

    // It's black's turn (Min player)
    for (Move move : clone.getMoves()) {

        // For each children node (available moves)
        // Their minimax value is calculated            
        double score = alphaBeta(clone, move, depth-1, alpha, beta);

        if (score < beta)
            // If this score is lower than beta
            // It is assigned to beta as the new lowest score
            beta = score;
        if (alpha >= beta)
            // The cycle is interrupted early if the value of alpha equals or is greater than beta
            break;
    }
    // The beta value is returned
    return beta;
}

老实说,我被困住了,我不确定我能做些什么来尝试弄清楚发生了什么。我已经在几个不同甚至随机生成的神经网络上尝试过 MiniMax+A/B,但在评估次数方面我从未见过有任何改进。我希望这里的人能够对这种情况有所了解,谢谢!

感谢@maraca 帮我解决了这个问题,因为他只回复了一条评论,所以我要回答自己。

我发布的代码没有问题,问题出在搜索达到深度限制后我使用的评估函数。

我使用的是一个未经训练的神经网络,它本质上只是吐出随机值,这迫使 MiniMax+A/B 遍历所有节点,因为答案不一致,结果是什么修剪是必要的。