在 TicTacToe minimax 算法中实现 alpha beta 剪枝

Question

在我的方法 newminimax49 中，我有一个使用 memoization and other general improvements which were suggested to me in this 的极小极大算法。该方法使用简单的启发式棋盘评估函数。我的问题基本上是关于 alpha beta 修剪，即我的 minimax 方法是否使用 alpha beta 修剪。据我所知，我相信它确实如此，但是我用来实现它的方法似乎太简单了，不可能是真的。此外，其他人建议我使用 alpha beta 修剪，正如我所说，我认为我的 minimax 方法已经这样做了，这让我相信我在这里做的是另一回事。所以这是我的 newminimax49:

//This method returns a 2 element int array containing the position of the best possible 
//next move and the score it yields. Utilizes memoization and supposedly alpha beta 
//pruning to achieve better performance. Alpha beta pruning can be seen in lines such as:
/*if(bestScore==-10)
     break;*/
//This basically means that if the best score achieved is the best possible score
//achievable then stop exploring the other available moves. Doing thing I believe
//I'm applying the same principle of alpha beta pruning.
public int[] newminimax49(){
    int bestScore = (turn == 'O') ? +9 : -9;    //X is minimizer, O is maximizer
    int bestPos=-1;
    int currentScore;
    //boardShow();
    String stateString = "";                                                
    for (int i=0; i<state.length; i++) 
        stateString += state[i];                        
    int[] oldAnswer = oldAnswers.get(stateString);                          
    if (oldAnswer != null) 
        return oldAnswer;
    if(isGameOver2()!='N'){
        //s.boardShow();
        bestScore= score();
    }
    else{
        //s.boardShow();
        int i=0;
        for(int x:getAvailableMoves()){
            if(turn=='X'){  //X is minimizer
                setX(x);
                //boardShow();
                //System.out.println(stateID++);
                currentScore = newminimax49()[0];
                revert(x);
                if(i==0){
                    bestScore = currentScore;
                    bestPos=x;
                    if(bestScore==-10)
                        break;
                }
                else if(currentScore<bestScore){
                    bestScore = currentScore;
                    bestPos=x;
                    if(bestScore==-10)
                        break;
                }
            }
            else {  //O is maximizer
                setO(x);
                //boardShow();
                //System.out.println(stateID++);
                currentScore = newminimax49()[0];
                revert(x);
                //boardShow();
                if(i==0){
                    bestScore = currentScore;
                    bestPos=x;
                    if(bestScore==10)
                        break;
                }

                else if(currentScore>bestScore){
                    bestScore = currentScore;
                    bestPos = x;
                    if(bestScore==10)
                        break;
                }
            }
            i++;
        }
    }
    int[] answer = {bestScore, bestPos};                                    
    oldAnswers.put (stateString, answer);                                   
    return answer;
}

我的 class State2 中使用的字段和构造函数：

private char [] state;  //Actual content of the board
private char turn;  //Whose turn it is
private Map<String,int[]> oldAnswers; //Used for memoization. It saves every state along with the score it yielded which allows us to stop exploring the children of a certain node if a similar node's score has been previously calculated. The key is the board state(i.e OX------X for example), the int array is a 2 element array containing the score and position of last placed seed of the state.  
private Map<Integer, int []> RowCol; //A mapping of positions from a board represented as a normal array to a board represented as a 2d array. For example: The position 0 maps to 0,0 on a 2d array board, 1 maps to 0,1 and so on.
private static int n;   //Size of the board
private static int stateID; //An simple incrementer used to show number of recursive calls in the newminiax49 method. 
private static int countX, countO; //Number of placed Xs and Os
private static int lastAdded; //Position of last placed seed
private char [][] DDState; //A 2d array representing the board. Contains the same values as state[]. Used for simplicity in functions that check the state of the board.

public State2(int n){
    int a=0;
    State2.n=n;
    state=new char[n*n];
    RowCol=new HashMap<Integer, int []>();
    countX=0;
    countO=0;
    //Initializing the board with empty slots
    for(int i = 0; i<state.length; i++){
        state[i]='-';
    }
    //Mapping
    for(int i=0; i<n; i++){
        for(int j=0; j<n; j++){
            RowCol.put(a, new int[]{i, j});
            a++;
        }
    }
    a=0;
    DDState=new char[n][n];
    //Initializing the 2d array with the values from state[](empty slots)
    for(int i=0; i<n; i++){
        for(int j=0; j<n; j++){
            DDState[i][j]=state[a];
            a++;
        }
    }
    oldAnswers = new HashMap<String,int[]>();
}

补充方法：

getAvailableMoves，returns 一个包含棋盘上空位的数组（即可能的下一步）。

public int[] getAvailableMoves(){
    int count=0;
    int i=0;
    for(int j=0; j<state.length; j++){
        if(state[j]=='-')
            count++;
    }
    int [] availableSlots = new int[count];
    for(int j=0; j<state.length; j++){
        if(state[j]=='-')
            availableSlots[i++]=j;      
    }
    return availableSlots;
}

isGameOver2()，简单地检查棋盘的当前状态以判断游戏是否结束。 returns 一个字符 'X', 'O', 'D' 和 'N' 分别代表 X won, O won, Draw, Not gameover.

public char isGameOver2(){
    char turnOpp;
    int count;
    if(turn=='X'){
        count=countO;
        turnOpp='O';
    }
    else {
        count=countX;
        turnOpp='X';
    }
    if(count>=n){ 
        //^No win available if each player has less than n seeds on the board

        //Checking begins
                //DDState[RowCol.get(lastAdded)[0]][RowCol.get(lastAdded)[1]]=turn;

                //Check column for win
                for(int i=0; i<n; i++){
                    if(DDState[i][RowCol.get(lastAdded)[1]]!=turnOpp)
                        break;
                    if(i==(n-1)){
                        //DDState[RowCol.get(x)[0]][RowCol.get(x)[1]]='-';
                        return turnOpp;
                    }
                }

                //Check row for win
                for(int i=0; i<n; i++){
                    if(DDState[RowCol.get(lastAdded)[0]][i]!=turnOpp)
                        break;
                    if(i==(n-1)){
                        //DDState[RowCol.get(x)[0]][RowCol.get(x)[1]]='-';
                        return turnOpp;
                    }
                }

                //Check diagonal for win
                if(RowCol.get(lastAdded)[0] == RowCol.get(lastAdded)[1]){

                    //we're on a diagonal
                    for(int i = 0; i < n; i++){
                        if(DDState[i][i] != turnOpp)
                            break;
                        if(i == n-1){
                            //DDState[RowCol.get(x)[0]][RowCol.get(x)[1]]='-';
                            return turnOpp;
                        }
                    }
                }

                //check anti diagonal 
                for(int i = 0; i<n; i++){
                    if(DDState[i][(n-1)-i] != turnOpp)
                        break;
                    if(i == n-1){
                        //DDState[RowCol.get(x)[0]][RowCol.get(x)[1]]='-';
                        return turnOpp;
                    }
                }

                //check for draw
                if((countX+countO)==(n*n))
                    return 'D';
            }
    return 'N';
}

boardShow, returns 棋盘当前状态的矩阵显示：

public void boardShow(){
    if(n==3){
        System.out.println(stateID);
        for(int i=0; i<=6;i+=3)
            System.out.println("["+state[i]+"]"+" ["+state[i+1]+"]"+" ["+state[i+2]+"]");
        System.out.println("***********");
    }
    else {
        System.out.println(stateID);
        for(int i=0; i<=12;i+=4)
            System.out.println("["+state[i]+"]"+" ["+state[i+1]+"]"+" ["+state[i+2]+"]"+" ["+state[i+3]+"]");
        System.out.println("***********");
    }   
}

score，是一个简单的评估函数，returns +10 表示 O 赢，-10 表示 X 赢，0 表示平局：

public int score(){
    if(isGameOver2()=='X')
        return -10;
    else if(isGameOver2()=='O')
        return +10;
    else 
        return 0;
}

播种机：

//Sets an X at a certain location and updates the turn, countX and lastAdded variables
public void setX(int i){
    state[i]='X';
    DDState[RowCol.get(i)[0]][RowCol.get(i)[1]]='X';
    turn='O';
    countX++;
    lastAdded=i;
}

//Sets an O at a certain location and updates the turn, countO and lastAdded variables
public void setO(int i){
    state[i]='O';
    DDState[RowCol.get(i)[0]][RowCol.get(i)[1]]='O';
    turn='X';
    countO++;
    lastAdded=i;
}

还原，只是还原一个动作。例如，如果一个 X 被放置在位置 0 revert(0) 设置一个 '-' 在它的位置并更新由 setX:

更改的变量

public void revert(int i){
    state[i]='-';
    DDState[RowCol.get(i)[0]][RowCol.get(i)[1]]='-';
    if(turn=='X'){
        turn = 'O';
        countO--;
    }
    else {
        turn = 'X';
        countX--;
    }
}

所以这对你们来说看起来像 alpha beta 修剪吗？如果不是，我该如何实现？

Answer 1

您已经在使用某种 "simplified" Alpha-Beta：目前，只要玩家找到获胜位置，您就会进行修剪。

一个合适的 AB 会给自己传递一个 Alpha 值和一个 Beta 值，以确定玩家将达到的最小值和最大值。在那里，只要得分低于或等于对方玩家的当前 "worst case"，您就会进行修剪。

在您的情况下，您不仅可以修剪获胜分数（就像您目前所做的那样），还可以修剪某些为 0 的分数。

在 TicTacToe minimax 算法中实现 alpha beta 剪枝

Implementing alpha beta pruning in a TicTacToe minimax algorithm

java

algorithm

artificial-intelligence

tic-tac-toe

minimax