TicTacToe minimax 算法 returns 4x4 游戏中的意外结果

TicTacToe minimax algorithm returns unexpected results in 4x4 games

在我的方法 newminimax499 中,我有一个使用记忆和 alpha beta p运行ing 的 minimax 算法。该方法通常适用于 3x3 游戏,但是当我玩 4x4 游戏时,我会得到奇怪的、意想不到的计算机位置选择。他仍然从不输,但他似乎并不是为了赢而比赛。为了说明这里的问题,我们使用 2 个 3x3 和 4x4 游戏的场景。首先是一个 3x3 游戏的场景,其中玩家是 X 并迈出第一步:

这还不错,事实上,这是人们希望计算机执行的操作。现在来看一个 4x4 游戏的场景。同样,O 是计算机,X 启动:

如您所见,计算机只是将 Os 按系统顺序一个接一个地放置,并且只有在它有可能获胜时才打破该顺序来阻止 X。这是非常防守的打法,不像在 3x3 比赛中看到的那样。那么为什么该方法对于 3x3 和 4x4 表现不同?

代码如下:

//This method returns a 2 element int array containing the position of the best possible 
//next move and the score it yields. Utilizes memoization and  alpha beta 
//pruning to achieve better performance. 
public int[] newminimax499(int a, int b){
    //int bestScore = (turn == 'O') ? +9 : -9;  //X is minimizer, O is maximizer
    int bestPos=-1;
    int alpha= a;
    int beta= b;
    int currentScore;
    //boardShow();
    String stateString = "";                                                
    for (int i=0; i<state.length; i++) 
        stateString += state[i];                        
    int[] oldAnswer = oldAnswers.get(stateString);                          
    if (oldAnswer != null) 
        return oldAnswer;
    if(isGameOver()!='N'){
        int[] answer = {score(), bestPos};                                    
        oldAnswers.put (stateString, answer);                                   
        return answer;
    }
    else{
        for(int x:getAvailableMoves()){
            if(turn=='X'){  //X is minimizer
                setX(x);
                //System.out.println(stateID++);
                currentScore = newminimax499(alpha, beta)[0];
                revert(x);
                if(currentScore<beta){
                    beta=currentScore;
                    bestPos=x;
                }
                if(alpha>=beta){
                    break;
                }
            }
            else {  //O is maximizer
                setO(x);
                //System.out.println(stateID++);
                currentScore = newminimax499(alpha, beta)[0];
                revert(x);
                if(currentScore>alpha){
                    alpha=currentScore;
                    bestPos=x;
                }
                if(alpha>=beta){
                    break;
                }
            }
        }
    }
    if(turn=='X'){
        int[] answer = {beta, bestPos};                                    
        oldAnswers.put (stateString, answer);                                   
        return answer;
    }
    else { 
        int[] answer = {alpha, bestPos};                                    
        oldAnswers.put (stateString, answer);                                   
        return answer;
    }
}

以下是你们 运行 代码所需的其他组件和补充方法。 我的 class State2:

中使用的字段和构造函数
private char [] state;  //Actual content of the board
private char turn;  //Whose turn it is
private Map<String,int[]> oldAnswers; //Used for memoization. It saves every state along with the score it yielded which allows us to stop exploring the children of a certain node if a similar node's score has been previously calculated. The key is the board state(i.e OX------X for example), the int array is a 2 element array containing the score and position of last placed seed of the state.  
private Map<Integer, int []> RowCol; //A mapping of positions from a board represented as a normal array to a board represented as a 2d array. For example: The position 0 maps to 0,0 on a 2d array board, 1 maps to 0,1 and so on.
private static int n;   //Size of the board
private static int stateID; //An simple incrementer used to show number of recursive calls in the newminiax49 method. 
private static int countX, countO; //Number of placed Xs and Os
private static int lastAdded; //Position of last placed seed
private char [][] DDState; //A 2d array representing the board. Contains the same values as state[]. Used for simplicity in functions that check the state of the board.

public State2(int n){
    int a=0;
    State2.n=n;
    state=new char[n*n];
    RowCol=new HashMap<Integer, int []>();
    countX=0;
    countO=0;
    //Initializing the board with empty slots
    for(int i = 0; i<state.length; i++){
        state[i]='-';
    }
    //Mapping
    for(int i=0; i<n; i++){
        for(int j=0; j<n; j++){
            RowCol.put(a, new int[]{i, j});
            a++;
        }
    }
    a=0;
    DDState=new char[n][n];
    //Initializing the 2d array with the values from state[](empty slots)
    for(int i=0; i<n; i++){
        for(int j=0; j<n; j++){
            DDState[i][j]=state[a];
            a++;
        }
    }
    oldAnswers = new HashMap<String,int[]>();
}

补充方法:

getAvailableMoves,returns 一个包含棋盘上空位的数组(即可能的下一步)。

public int[] getAvailableMoves(){
    int count=0;
    int i=0;
    for(int j=0; j<state.length; j++){
        if(state[j]=='-')
            count++;
    }
    int [] availableSlots = new int[count];
    for(int j=0; j<state.length; j++){
        if(state[j]=='-')
            availableSlots[i++]=j;      
    }
    return availableSlots;
}

isGameOver2(),简单地检查棋盘的当前状态以判断游戏是否结束。 returns 一个字符 'X'、'O'、'D' 和 'N' 分别代表 X 赢了、O 赢了、平局和没有游戏结束。

public char isGameOver2(){
    char turnOpp;
    int count;
    if(turn=='X'){
        count=countO;
        turnOpp='O';
    }
    else {
        count=countX;
        turnOpp='X';
    }
    if(count>=n){ 
        for(int i=0; i<n; i++){
            if(DDState[i][RowCol.get(lastAdded)[1]]!=turnOpp)
                break;
            if(i==(n-1)){
                return turnOpp;
            }
        }

        //Check row for win
        for(int i=0; i<n; i++){
            if(DDState[RowCol.get(lastAdded)[0]][i]!=turnOpp)
                break;
            if(i==(n-1)){
                return turnOpp;
            }
        }

        //Check diagonal for win
        if(RowCol.get(lastAdded)[0] == RowCol.get(lastAdded)[1]){

            //we're on a diagonal
            for(int i = 0; i < n; i++){
                if(DDState[i][i] != turnOpp)
                    break;
                if(i == n-1){
                    return turnOpp;
                }
            }
        }

        //check anti diagonal 

        for(int i = 0; i<n; i++){
            if(DDState[i][(n-1)-i] != turnOpp)
                break;
            if(i == n-1){
                return turnOpp;
            }
        }

        //check for draw
        if((countX+countO)==(n*n))
            return 'D';

            }
    return 'N';
}

boardShow,returns当前棋盘状态的矩阵显示:

public void boardShow(){
    if(n==3){
        System.out.println(stateID);
        for(int i=0; i<=6;i+=3)
            System.out.println("["+state[i]+"]"+" ["+state[i+1]+"]"+" ["+state[i+2]+"]");
        System.out.println("***********");
    }
    else {
        System.out.println(stateID);
        for(int i=0; i<=12;i+=4)
            System.out.println("["+state[i]+"]"+" ["+state[i+1]+"]"+" ["+state[i+2]+"]"+" ["+state[i+3]+"]");
        System.out.println("***********");
    }   
}

score,是一个简单的评估函数,returns +10 表示 O 赢,-10 表示 X 赢,0 表示平局:

public int score(){
    if(isGameOver2()=='X')
        return -10;
    else if(isGameOver2()=='O')
        return +10;
    else 
        return 0;
}

播种机:

//Sets an X at a certain location and updates the turn, countX and lastAdded variables
public void setX(int i){
    state[i]='X';
    DDState[RowCol.get(i)[0]][RowCol.get(i)[1]]='X';
    turn='O';
    countX++;
    lastAdded=i;
}

//Sets an O at a certain location and updates the turn, countO and lastAdded variables
public void setO(int i){
    state[i]='O';
    DDState[RowCol.get(i)[0]][RowCol.get(i)[1]]='O';
    turn='X';
    countO++;
    lastAdded=i;
}

还原,简单地还原一个动作。例如,如果一个 X 被放置在位置 0 revert(0) 设置一个 '-' 在它的位置并更新由 setX:

更改的变量
public void revert(int i){
    state[i]='-';
    DDState[RowCol.get(i)[0]][RowCol.get(i)[1]]='-';
    if(turn=='X'){
        turn = 'O';
        countO--;
    }
    else {
        turn = 'X';
        countX--;
    }
}

主要方法可能看起来像:

public static void main(String[] args) {
    State2 s=new State2(4);
    int [] results=new int[2];
    s.setX(0);
    long startTime = System.currentTimeMillis();
    results=s.newminimax499(Integer.MIN_VALUE,Integer.MAX_VALUE);
    long endTime = System.currentTimeMillis();
    System.out.println("Score: "+results[0]+" Position: "+ results[1]);
    System.out.println("Run time: " + (endTime-startTime));
    s.boardShow();

}

我不相信这里有一个错误 -- 如果 O 在较早的位置之一下棋,它会分叉,而如果它在中间下棋,它会强制平局。估计4x4的游戏更难win/lose,所以电脑无所谓的选择第一个空方格

下面,1表示O的强制应战,2表示X的分叉走法,?表示可能的获胜位置。

X|O|
-+-+-
2|X|?
-+-+-
?| |1

X| |O
-+-+-
X|2|?
-+-+-
1| |?

X|2|?
-+-+-
O|X|
-+-+-
 |?|1

我放弃了尝试理解代码 - 房间开始变得对我的同事来说太吵了。

但是 - 我确实整理了我自己的解决方案 - 我觉得它可能值得 posting。它的设计并不是特别高效——事实上我还在等待它在 4X4 测试中迈出第一步,但它似乎在 3X3 上做正确的事情(即与自己平局并战胜愚蠢的策略) .

它应该能够处理不等尺寸的板。对角线是使用对角线步长从板的一侧延伸到另一侧的任何正方形线。

一旦它完成它的 4X4 运行 我将 pos 它的移动进行比较。

public class TicTacToe {

    /**
     * Strategies for players.
     */
    interface Strategy {

        /**
         * Think up a place to move.
         *
         * @param board - The current board position.
         * @return Where to move.
         */
        Optional<Board.Square> think(Board board, Player player);
    }

    /**
     * Stupid strategy just finds the first empty square and plays it.
     */
    static final Strategy stupid = (Board board, Player player) -> {
        List<Board.Square> empty = board.getEmpties();
        if (empty.size() > 0) {
            return Optional.of(empty.get(0));
        }
        return Optional.empty();
    };

    /**
     * Minimax strategy.
     */
    static final Strategy minimax = new Strategy() {
        // Divide by this at each depth step.
        private static final double FACTOR = 3;

        // Memoize each discovered best move.
        Map<Player, Map<String, Board.Place>> best = new HashMap<>();

        {
            // One map for each player.
            for (Player p : Player.values()) {
                best.put(p, new HashMap<>());
            }
        }

        @Override
        public Optional<Board.Square> think(Board board, Player player) {
            Optional<Board.Square> win = Optional.empty();
            // Seen this one before?
            Board.Place seen = best.get(player).get(board.toString());
            if (seen == null) {
                double winValue = Double.NEGATIVE_INFINITY;
                for (Board.Square play : board.getEmpties()) {
                    double value = value(board, player, play, 1);
                    if (value > winValue) {
                        win = Optional.of(play);
                        winValue = value;
                    }
                }
                if (win.isPresent()) {
                    // Record my result.
                    best.get(player).put(board.toString(), win.get().place);
                } else {
                    System.out.println("No best found for " + board);
                }
            } else {
                // Reuse it.
                win = Optional.of(board.square(seen));
            }
            return win;
        }

        private double value(Board board, Player player, Board.Square move, double scale) {
            // My value.
            double value = 0;
            // Make the move.
            board.mark(player, move);
            try {
                // A win?
                if (!board.win()) {
                    // Remaining empties.
                    List<Board.Square> empties = board.getEmpties();
                    if (!empties.isEmpty()) {
                        // Record it's value.
                        double[] values = new double[empties.size()];
                        for (int i = 0; i < values.length; i++) {
                            // Try each further move negating and downscaling at each level.
                            values[i] = value(board, player.next(), empties.get(i), (scale / FACTOR) * -1);
                        }
                        // Add them up.
                        value += DoubleStream.of(values).sum();
                    }
                } else {
                    // Somebody already won.
                    value = scale;
                }
            } finally {
                //System.out.println("Value of " + board + " to player " + player + " = " + value);
                // Unroll the move.
                board.unmark(player, move);
            }
            return value;
        }
    };

    /**
     * There are exactly two players.
     */
    enum Player {

        O, X;

        /**
         * Each player has a strategy.
         *
         * Defaults to stupid.
         */
        private Strategy strategy = stupid;

        /**
         * What mark they make on the board.
         *
         * @return String representing the mark they make.
         */
        public char mark() {
            // The mark they make is the first character of their name.
            return name().charAt(0);
        }

        /**
         * Who is the next player.
         *
         * @return The other player.
         */
        public Player next() {
            return other(this);
        }

        /**
         * Who is the other player.
         *
         * @return The other player.
         */
        static Player other(Player it) {
            return it == O ? X : O;
        }

        public Strategy getStrategy() {
            return strategy;
        }

        public Player setStrategy(Strategy strategy) {
            this.strategy = strategy;
            return this;
        }
    }

    /**
     * The board.
     */
    static class Board {

        /**
         * Top left corner is 0,0. Access is [y][x].
         */
        final Square[][] board;
        // The board as columns.
        final List<List<Square>> columns;
        // The dagonals.
        final List<List<Square>> diagonals;
        // For sense checking.
        final int width;
        final int height;
        // Counts for each player - useful optimisation.
        final int[] counts = new int[Player.values().length];

        // A clear square.
        private static final char Clear = ' ';

        public Board(int width, int height) {
            this.width = width;
            this.height = height;
            board = new Square[height][];
            for (int y = 0; y < height; y++) {
                board[y] = new Square[width];
                // Fill it.
                for (int x = 0; x < width; x++) {
                    board[y][x] = new Square(x, y);
                }
            }
            // Build my columns lists.
            columns = new ArrayList<>();
            for (int x = 0; x < width; x++) {
                List<Square> column = new ArrayList<>();
                for (int y = 0; y < height; y++) {
                    column.add(board[y][x]);
                }
                columns.add(column);
            }
            // And the diagonals.
            diagonals = new ArrayList<>();
            for (int y = 0; y <= height - width; y++) {
                List<Square> diagonal = new ArrayList<>();
                // Left to right.
                for (int x = 0; x < width; x++) {
                    diagonal.add(board[y + x][x]);
                }
                diagonals.add(diagonal);
                // Right to left.
                diagonal = new ArrayList<>();
                for (int x = 0; x < width; x++) {
                    diagonal.add(board[y + x][width - 1 - x]);
                }
                diagonals.add(diagonal);
            }
        }

        public Board(int size) {
            this(size, size);
        }

        private Stream<List<Square>> asRows() {
            // Map each row to a list.
            return Arrays.stream(board).map(r -> Arrays.asList(r));
        }

        private Stream<List<Square>> asCols() {
            // Walk the columns.
            return columns.stream();
        }

        private Stream<List<Square>> asDiagonals() {
            // Walk the diagonals.
            return diagonals.stream();
        }

        private boolean winning(List<Square> row) {
            //System.out.println("winner(" + row + ")");
            if (row.isEmpty()) {
                return false;
            }
            Optional<Player> owner = row.get(0).owner;
            // First square owned.
            if (!owner.isPresent()) {
                return false;
            }
            return !row.stream()
                    //.peek(s -> System.out.println(s))
                    .anyMatch((s) -> (!s.owner.isPresent() || s.owner.get() != owner.get()));
        }

        public Square square(Place place) {
            return board[place.y][place.x];
        }

        private Optional<Player> getWinner() {
            Optional<Player> winner = Optional.empty();
            // Must be at least enough plays by one player to reach across/down the board.
            OptionalInt min = IntStream.of(counts).min();
            if (!min.isPresent() || min.getAsInt() < Math.min(width, height)) {
                // Nobody can posibly have won.
                return winner;
            }
            List<List<Square>> winners = asRows()
                    .filter(r -> winning(r))
                    .collect(Collectors.toList());
            if (winners.isEmpty()) {
                winners = asCols()
                        .filter(r -> winning(r))
                        .collect(Collectors.toList());
            }
            if (winners.isEmpty()) {
                winners = asDiagonals()
                        .filter(r -> winning(r))
                        .collect(Collectors.toList());
            }
            if (!winners.isEmpty()) {
                winner = winners.get(0).get(0).owner;
            }
            return winner;
        }

        private boolean isDrawn() {
            // All square occupied - counts add up to width*height.
            return IntStream.of(counts).sum() == width * height;
        }

        private boolean win() {
            return getWinner().isPresent();
        }

        /**
         * One square of the board.
         *
         * Remember that a Square is ON a board - i.e. it is a sub-object of a Board. Do not attempt to memoize Squares as they will hold a reference to their parent board.
         */
        class Square {

            // The owner of it.
            Optional<Player> owner = Optional.empty();
            // where it is on the board.
            final Place place;

            public Square(Place place) {
                this.place = place;
            }

            public Square(int x, int y) {
                this(new Place(x, y));
            }

            public char getMark() {
                return owner.isPresent() ? owner.get().mark() : Clear;
            }

            public boolean isTaken() {
                return owner.isPresent();
            }

            @Override
            public String toString() {
                return place + "(" + (owner.isPresent() ? owner.get().toString() : "") + ")";
            }

            private void take(Player player) {
                if (!isTaken()) {
                    owner = Optional.of(player);
                    // Keep track of the counts.
                    counts[player.ordinal()] += 1;
                } else {
                    throw new IllegalStateException("Attempt to take taken square!"
                            + " Square=" + this
                            + " Player=" + player
                            + " board=" + Board.this.toString()
                    );
                }
            }

            private void release(Player player) {
                if (isTaken()) {
                    if (owner.get() == player) {
                        owner = Optional.empty();
                        // Keep track of the counts.
                        counts[player.ordinal()] -= 1;
                    } else {
                        throw new IllegalStateException("Attempt to release other player's square!"
                                + " Square=" + this
                                + " Player=" + player
                                + " board=" + Board.this.toString()
                        );
                    }
                } else {
                    throw new IllegalStateException("Attempt to release untaken square!"
                            + " Square=" + this
                            + " Player=" + player
                            + " board=" + Board.this.toString()
                    );
                }
            }

        }

        private List<Square> getEmpties() {
            // Walk all squares.
            return Arrays.stream(board)
                    // Unroll each row.
                    .flatMap(l -> Arrays.stream(l))
                    // All not taken.
                    .filter(s -> !s.isTaken())
                    // As a list.
                    .collect(Collectors.toList());
        }

        /**
         * A place on the board.
         */
        class Place {

            final int x;
            final int y;

            public Place(int x, int y) {
                // Sense check.
                if (x < 0 || x >= width) {
                    throw new IllegalArgumentException("Off board: x = " + x);
                }
                if (y < 0 || y >= height) {
                    throw new IllegalArgumentException("Off board: x = " + x);
                }
                this.x = x;
                this.y = y;
            }

            @Override
            public String toString() {
                return "{" + x + "," + y + '}';
            }

        }

        /**
         * Mark a place on the board.
         *
         * @param player
         */
        public void mark(Player player, Square square) {
            // Take the square.
            square.take(player);
        }

        /**
         * UnMark a place on the board.
         *
         * Confirms the mark first.
         */
        public void unmark(Player player, Square square) {
            square.release(player);
        }

        @Override
        public String toString() {
            return Arrays.stream(board)
                    .map(r -> Arrays.stream(r)
                            .map(s -> String.valueOf(s.getMark()))
                            .collect(Collectors.joining("", "[", "]")))
                    .collect(Collectors.joining(""));
        }

    }

    class Game {

        // The current board state.
        final Board board;

        public Game(Board board) {
            this.board = board;
        }

        public Game(int size) {
            this(new Board(size));
        }

        public Game(int width, int height) {
            this(new Board(width, height));
        }

        private void play(Player firstPlayer) {
            boolean gameFinished = false;
            Player player = firstPlayer;
            do {
                Optional<Board.Square> move = player.getStrategy().think(board, player);
                if (move.isPresent()) {
                    board.mark(player, move.get());
                    System.out.println("Moved: " + move.get() + " -> " + board);
                } else {
                    System.out.println("No move!");
                }
                // Has anyone won?
                Optional<Player> winner = board.getWinner();
                if (winner.isPresent()) {
                    System.out.println("Player " + winner.get() + " won! " + board);
                    gameFinished = true;
                } else {
                    if (board.isDrawn()) {
                        System.out.println("Draw! " + board);
                        gameFinished = true;
                    }
                }
                // Next player.
                player = player.next();
            } while (!gameFinished);

        }

    }

    private void testStupid() {
        // Both start off stupid.
        new Game(3).play(Player.X);
    }

    private void testMinimax() {
        // Test minimax strategy.
        Board testBoard = new Board(3);
        // Set it up like http://neverstopbuilding.com/minimax
        testBoard.mark(Player.O, testBoard.board[0][0]);
        testBoard.mark(Player.X, testBoard.board[0][2]);
        testBoard.mark(Player.X, testBoard.board[1][0]);
        testBoard.mark(Player.X, testBoard.board[2][0]);
        testBoard.mark(Player.O, testBoard.board[2][1]);
        testBoard.mark(Player.O, testBoard.board[2][2]);
        // What does the strategy do?
        Optional<Board.Square> play = minimax.think(testBoard, Player.X);
        System.out.println("minimax(" + testBoard + ") = " + play);
    }

    private void test3X3Game() {
        // Play a game.
        // Change strategies.
        new Game(3).play(Player.X);
    }

    private void test4X4Game() {
        // Play a game.
        new Game(4).play(Player.X);
    }

    public void test() {
        testStupid();
        testMinimax();
        System.out.println("Test minmax vs stupid");
        Player.X.setStrategy(minimax);
        test3X3Game();
        System.out.println("Test minmax vs minmax");
        Player.O.setStrategy(minimax);
        test3X3Game();
        System.out.println("Test 4 X 4");
        test4X4Game();
    }

    public static void main(String args[]) {
        try {
            new TicTacToe().test();
        } catch (Throwable t) {
            t.printStackTrace(System.err);
        }
    }

}

当对自己玩 minimax 时,它会移动:

Moved: {1,1}(X) -> [   ][ X ][   ]
Moved: {0,0}(O) -> [O  ][ X ][   ]
Moved: {2,2}(X) -> [O  ][ X ][  X]
Moved: {0,2}(O) -> [O  ][ X ][O X]
Moved: {0,1}(X) -> [O  ][XX ][O X]
Moved: {2,1}(O) -> [O  ][XXO][O X]
Moved: {1,0}(X) -> [OX ][XXO][O X]
Moved: {1,2}(O) -> [OX ][XXO][OOX]
Moved: {2,0}(X) -> [OXX][XXO][OOX]
Draw! [OXX][XXO][OOX]

请注意,此 os 不是 OP pos 的移动,因此 OP 的算法显然有问题。