这个游戏的最佳算法是什么?
what is the best algorithm for this game?
我想为这个游戏做一个机器人:
游戏使用 8 张牌,编号为 1 到 8。游戏开始时,这些牌都在 table(桌子)上,正面朝上。引用自说明:
The goal of the game is that you reach the sum of 15 with 3 cards in your hand. The first person to reach this goal in a maximum of 30 turns is the winner.
Each time at your turn, you are supposed to take one card from the desk.
When you have 3 cards in your hand, you should swap one of your cards in your hand with a card on the desk.
双方玩家始终可以看到所有牌。
赢得比赛的最佳算法是什么?
您可以将游戏表示为有向图,节点是游戏中的状态,边是移动。
然后,您可以将最终输赢状态(即其中一名玩家达到 15 的状态)标记为赢或输。 (有些状态是“不可能的”,两个玩家都达到了 15)。
然后,您可以重复遍历所有状态,如果有一个获胜的动作,则将状态标记为获胜,如果所有动作都失败,则将状态标记为失败。你重复直到没有状态从未知变为输赢。
在像这样的游戏中,有些状态可能是平局,因为可以重复游戏状态。此处确实如此,因此可能是当您完成迭代时,某些状态既未标记为成功也未标记为失败。这些是重复抽奖。
通常在解决游戏时,您必须高效地进行迭代。这里的状态总数是1769,比较少,所以每次遍历所有状态就可以了。
这个答案的最后是解决游戏的代码 prints the optimal strategy。
在攻略中,要移动的玩家排在第一位。
例如策略中的这一行:
128 - 346 : WINS: 1->5 | LOSSES: 1->7, 2->7, 8->5, 8->7 | DRAWS: 2->5
表示要移动的玩家有牌 [1, 2, 8] 而他们的对手有牌 [3, 4, 6]。移动的玩家可以通过丢弃 1 并拿起 5 来获胜,如果移动的玩家丢弃 2 并拿起 5,则移动的玩家可以平局(双方玩家的最佳发挥),否则其他玩家可以强制获胜。
第一行:
/ - / : WINS: none | LOSSES: none | DRAWS: 1, 2, 3, 4, 5, 6, 7, 8
表示在初始状态下(/
表示没有手牌),那么无论你拿起什么牌,游戏都是平局。
import itertools
def states():
N = set(range(1, 9))
for k in range(7): # total number of cards picked up.
for p1 in itertools.combinations(N, k//2):
for p2 in itertools.combinations(N.difference(p1), (k+1)//2):
yield p1, p2
STATES = list(states())
PICKUPS = dict(((p1, p2), set(range(1, 9)).difference(p1).difference(p2)) for p1, p2 in STATES)
WL = dict()
def moves(p1, p2):
if len(p1) == 3 and len(p2) == 3:
for i in range(3):
for c2 in PICKUPS[p1, p2]:
new_p1 = p1[:i] + p1[i+1:] + (c2,)
yield p2, tuple(sorted(new_p1))
return
for c2 in PICKUPS[p1, p2]:
new_p1 = tuple(sorted(p1 + (c2,)))
yield p2, new_p1
def raw_score(S):
p1, p2 = S
w1 = len(p1) == 3 and sum(p1) == 15
w2 = len(p2) == 3 and sum(p2) == 15
if w1 and w2:
return 0
elif w1:
return 1
elif w2:
return -1
for S in STATES:
score = raw_score(S)
if score is not None:
WL[S] = score
while True:
modified = False
for S in STATES:
if S in WL:
continue
move_scores = [WL.get(x) for x in moves(*S)]
if -1 in move_scores:
WL[S] = 1
modified = True
if all(x == 1 for x in move_scores):
WL[S] = -1
modified = True
if not modified:
break
def droptake(old, new):
if len(old) == len(new):
return [x for x in old if x not in new][0], [x for x in new if x not in old][0]
else:
return None, [x for x in new if x not in old][0]
def describe(S, S2):
old = S[0]
new = S2[1]
drop, take = droptake(old, new)
if drop:
return '%d->%d' % (drop, take)
else:
return '%d' % take
def describe_state(S):
p1, p2 = S
return (''.join(str(x) for x in p1) or '/')+ ' - ' + (''.join(str(x) for x in p2) or '/')
index = dict()
for i, S in enumerate(STATES):
index[S] = i
for S in STATES:
wins = []
loses = []
draws = []
score = raw_score(S)
if score is not None:
if score != 0:
print(describe_state(S), ' :', '<WON>' if score == 1 else '<LOST>')
continue
for S2 in moves(*S):
score = WL.get(S2)
if score == -1:
wins.append(describe(S, S2))
elif score == 1:
loses.append(describe(S, S2))
else:
draws.append(describe(S, S2))
print(describe_state(S), ' :', 'WINS:', ', '.join(wins) or 'none', '| LOSSES:', ', '.join(loses) or 'none', '| DRAWS:', ', '.join(draws) or 'none')
我建议创建一个 minimax 算法,它(就像@PaulHankin 的回答)创建一个博弈树。不同之处在于状态是 发现的 随着移动的进行。
这个问题引起了我的兴趣,因此我在 JavaScript 中创建了一个交互式代码段,这很有趣。它向您显示可用的动作,以及它们是否会导致胜利、失败或平局。在赢或输的情况下,它会告诉你有多少次半步(步数)你远离那个命运(最佳游戏)。
享受:
class Node {
constructor(id) {
this.id = id;
this.neighbors = []; // List of Node instances that can be reached with one move
}
cardOwner(card) {
return (this.id >> (2*card-2)) & 3; // Decode binary representation
}
* generateMoves() { // Yield the moves that player 1 can make
let hands = [[], [], []];
// Translate id to sets of cards
for (let card = 1; card <= 8; card++) hands[this.cardOwner(card)].push(card);
let [desk, myHand, otherHand] = hands;
if (otherHand.length < 3 || otherHand.reduce((a, b) => a + b, 0) !== 15) {
if (myHand.length < 3) {
for (let card of desk) yield this.getTargetId(card);
} else {
for (let discard of myHand) {
for (let replace of desk) yield this.getTargetId(discard, replace);
}
}
}
}
getTargetId(...cards) {
let id = this.id;
for (let card of cards) id ^= 1 << (card*2-2);
// Toggle owner, as in each state we consider player 1 as the player to move
for (let mask = 3; mask < 0x10000; mask <<= 2) {
if (id & mask) id ^= mask;
}
return id;
}
compare(next) {
let id = this.id;
let nextId = next.id;
let diff = { replace: 0, discard: 0 };
for (let card = 1; card <= 8; card++) {
if ((id & 3) && !(nextId & 3)) diff.discard = card;
else if (!(id & 3) && (nextId & 3)) diff.replace = card;
id >>= 2;
nextId >>= 2;
}
return diff;
}
}
class Game {
constructor() {
this.hist = []; // Previous states (after moves have been made)
this.node = new Node(0); // 0 is the initial state where all 8 cards are on desk
this.visited = new Map; // Evaluated states; Node instances keyed by their identifier
}
move(target) {
this.hist.push(this.node);
this.node = target;
}
undo() {
this.node = this.hist.pop();
}
minimax() {
if (this.node.value !== undefined) return; // Already visited
// Generate neighbors for this Node
this.node.neighbors = Array.from(this.node.generateMoves(), targetId => {
let target = this.visited.get(targetId); // Get a Node instance
if (!target) this.visited.set(targetId, target = new Node(targetId));
return target;
});
if (!this.node.neighbors.length) { // Game over!
this.node.value = 100;
return;
}
// Assign temporary value while depth-first search is ongoing.
// This will ensure that a revisit is considered a draw (repetition of moves)
this.node.value = 0; // 0 indicates a draw
let bestValue = -Infinity;
for (let target of this.node.neighbors) {
this.move(target);
this.minimax();
bestValue = Math.max(bestValue, this.node.value);
this.undo();
}
// Definite value: reduce its absolute value, so to favour quick wins over slow wins
this.node.value = bestValue && (Math.abs(bestValue) - 1) * Math.sign(-bestValue);
}
}
let game = new Game;
game.minimax(); // Create the full game tree rooted at the initial state
// I/O management
let state = document.getElementById("state");
let moves = document.getElementById("moves");
let undo = document.getElementById("undo");
let message = document.getElementById("message");
function display() {
let turn = game.hist.length % 2;
let mapper = [1, turn*2, 2 - turn*2];
for (let card = 1; card <= 8; card++) {
let owner = game.node.cardOwner(card);
let ownerRow = state.rows[mapper[owner]];
for (let row of state.rows) {
row.cells[card].textContent = row === ownerRow ? card : "";
}
}
state.rows[0].classList.toggle("turn", !turn);
state.rows[2].classList.toggle("turn", turn);
message.textContent = game.node.neighbors.length ? `Player ${turn + 1} to play. Make your choice:` : `Player ${2 - turn} won!`;
undo.disabled = !game.hist.length;
moves.innerHTML = "";
for (let target of game.node.neighbors) {
let { discard, replace } = game.node.compare(target);
let button = document.createElement("button");
button.textContent = `${discard ? `Replace ${discard} with` : "Select"} ${replace} ${
target.value < 0 ? `(lose in ${101+target.value})` :
target.value > 0 ? `(win in ${101-target.value})` : "(draw)"
}`;
moves.appendChild(button);
}
}
moves.addEventListener("click", function (e) {
let moveIndex = [...moves.children].indexOf(e.target);
if (moveIndex >= 0) game.move(game.node.neighbors[moveIndex]);
display();
});
undo.addEventListener("click", function() {
game.undo();
display();
});
display();
td { border: 1px solid; background: yellow; min-width: 1em; text-align: center }
td:empty, td:first-child { border-color: transparent; background: none }
tr:nth-child(odd) { background: lightblue }
tr.turn { background: orange }
#moves > * { display: block }
<table id="state">
<tr>
<td>Player 1: </td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td>
</tr>
<tr>
<td>Desk: </td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td>
</tr>
<tr>
<td>Player 2: </td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td>
</tr>
</table>
<div id="message"></div>
<button id="undo">Undo last move</button><br>
<div id="moves"></div>
备注
在双方发挥最佳的情况下,本场比赛为平局。如果我们考虑到第二个玩家在游戏的第一阶段必须更加小心不要犯错,那么第一个玩家有点优势。
当第一个玩家选择牌 6 或 8 时,第二个玩家 必须 拿牌 5 否则他们将输(最佳游戏)。博弈树迅速扩大,但也有其他州只有一个好棋。
发现的不同状态数(包括根状态)为 1713。该算法不考虑 30 步限制,因为这是一个“乏味”的规则。 not 有这个限制的好处是算法需要足够聪明以检测重复。有了30手的限制,就不需要进行这样的循环检查了。
Python
可以在 repl.it
上找到相同的实现(在控制台上交互)
我想为这个游戏做一个机器人:
游戏使用 8 张牌,编号为 1 到 8。游戏开始时,这些牌都在 table(桌子)上,正面朝上。引用自说明:
The goal of the game is that you reach the sum of 15 with 3 cards in your hand. The first person to reach this goal in a maximum of 30 turns is the winner.
Each time at your turn, you are supposed to take one card from the desk.
When you have 3 cards in your hand, you should swap one of your cards in your hand with a card on the desk.
双方玩家始终可以看到所有牌。
赢得比赛的最佳算法是什么?
您可以将游戏表示为有向图,节点是游戏中的状态,边是移动。
然后,您可以将最终输赢状态(即其中一名玩家达到 15 的状态)标记为赢或输。 (有些状态是“不可能的”,两个玩家都达到了 15)。
然后,您可以重复遍历所有状态,如果有一个获胜的动作,则将状态标记为获胜,如果所有动作都失败,则将状态标记为失败。你重复直到没有状态从未知变为输赢。
在像这样的游戏中,有些状态可能是平局,因为可以重复游戏状态。此处确实如此,因此可能是当您完成迭代时,某些状态既未标记为成功也未标记为失败。这些是重复抽奖。
通常在解决游戏时,您必须高效地进行迭代。这里的状态总数是1769,比较少,所以每次遍历所有状态就可以了。
这个答案的最后是解决游戏的代码 prints the optimal strategy。
在攻略中,要移动的玩家排在第一位。
例如策略中的这一行:
128 - 346 : WINS: 1->5 | LOSSES: 1->7, 2->7, 8->5, 8->7 | DRAWS: 2->5
表示要移动的玩家有牌 [1, 2, 8] 而他们的对手有牌 [3, 4, 6]。移动的玩家可以通过丢弃 1 并拿起 5 来获胜,如果移动的玩家丢弃 2 并拿起 5,则移动的玩家可以平局(双方玩家的最佳发挥),否则其他玩家可以强制获胜。
第一行:
/ - / : WINS: none | LOSSES: none | DRAWS: 1, 2, 3, 4, 5, 6, 7, 8
表示在初始状态下(/
表示没有手牌),那么无论你拿起什么牌,游戏都是平局。
import itertools
def states():
N = set(range(1, 9))
for k in range(7): # total number of cards picked up.
for p1 in itertools.combinations(N, k//2):
for p2 in itertools.combinations(N.difference(p1), (k+1)//2):
yield p1, p2
STATES = list(states())
PICKUPS = dict(((p1, p2), set(range(1, 9)).difference(p1).difference(p2)) for p1, p2 in STATES)
WL = dict()
def moves(p1, p2):
if len(p1) == 3 and len(p2) == 3:
for i in range(3):
for c2 in PICKUPS[p1, p2]:
new_p1 = p1[:i] + p1[i+1:] + (c2,)
yield p2, tuple(sorted(new_p1))
return
for c2 in PICKUPS[p1, p2]:
new_p1 = tuple(sorted(p1 + (c2,)))
yield p2, new_p1
def raw_score(S):
p1, p2 = S
w1 = len(p1) == 3 and sum(p1) == 15
w2 = len(p2) == 3 and sum(p2) == 15
if w1 and w2:
return 0
elif w1:
return 1
elif w2:
return -1
for S in STATES:
score = raw_score(S)
if score is not None:
WL[S] = score
while True:
modified = False
for S in STATES:
if S in WL:
continue
move_scores = [WL.get(x) for x in moves(*S)]
if -1 in move_scores:
WL[S] = 1
modified = True
if all(x == 1 for x in move_scores):
WL[S] = -1
modified = True
if not modified:
break
def droptake(old, new):
if len(old) == len(new):
return [x for x in old if x not in new][0], [x for x in new if x not in old][0]
else:
return None, [x for x in new if x not in old][0]
def describe(S, S2):
old = S[0]
new = S2[1]
drop, take = droptake(old, new)
if drop:
return '%d->%d' % (drop, take)
else:
return '%d' % take
def describe_state(S):
p1, p2 = S
return (''.join(str(x) for x in p1) or '/')+ ' - ' + (''.join(str(x) for x in p2) or '/')
index = dict()
for i, S in enumerate(STATES):
index[S] = i
for S in STATES:
wins = []
loses = []
draws = []
score = raw_score(S)
if score is not None:
if score != 0:
print(describe_state(S), ' :', '<WON>' if score == 1 else '<LOST>')
continue
for S2 in moves(*S):
score = WL.get(S2)
if score == -1:
wins.append(describe(S, S2))
elif score == 1:
loses.append(describe(S, S2))
else:
draws.append(describe(S, S2))
print(describe_state(S), ' :', 'WINS:', ', '.join(wins) or 'none', '| LOSSES:', ', '.join(loses) or 'none', '| DRAWS:', ', '.join(draws) or 'none')
我建议创建一个 minimax 算法,它(就像@PaulHankin 的回答)创建一个博弈树。不同之处在于状态是 发现的 随着移动的进行。
这个问题引起了我的兴趣,因此我在 JavaScript 中创建了一个交互式代码段,这很有趣。它向您显示可用的动作,以及它们是否会导致胜利、失败或平局。在赢或输的情况下,它会告诉你有多少次半步(步数)你远离那个命运(最佳游戏)。
享受:
class Node {
constructor(id) {
this.id = id;
this.neighbors = []; // List of Node instances that can be reached with one move
}
cardOwner(card) {
return (this.id >> (2*card-2)) & 3; // Decode binary representation
}
* generateMoves() { // Yield the moves that player 1 can make
let hands = [[], [], []];
// Translate id to sets of cards
for (let card = 1; card <= 8; card++) hands[this.cardOwner(card)].push(card);
let [desk, myHand, otherHand] = hands;
if (otherHand.length < 3 || otherHand.reduce((a, b) => a + b, 0) !== 15) {
if (myHand.length < 3) {
for (let card of desk) yield this.getTargetId(card);
} else {
for (let discard of myHand) {
for (let replace of desk) yield this.getTargetId(discard, replace);
}
}
}
}
getTargetId(...cards) {
let id = this.id;
for (let card of cards) id ^= 1 << (card*2-2);
// Toggle owner, as in each state we consider player 1 as the player to move
for (let mask = 3; mask < 0x10000; mask <<= 2) {
if (id & mask) id ^= mask;
}
return id;
}
compare(next) {
let id = this.id;
let nextId = next.id;
let diff = { replace: 0, discard: 0 };
for (let card = 1; card <= 8; card++) {
if ((id & 3) && !(nextId & 3)) diff.discard = card;
else if (!(id & 3) && (nextId & 3)) diff.replace = card;
id >>= 2;
nextId >>= 2;
}
return diff;
}
}
class Game {
constructor() {
this.hist = []; // Previous states (after moves have been made)
this.node = new Node(0); // 0 is the initial state where all 8 cards are on desk
this.visited = new Map; // Evaluated states; Node instances keyed by their identifier
}
move(target) {
this.hist.push(this.node);
this.node = target;
}
undo() {
this.node = this.hist.pop();
}
minimax() {
if (this.node.value !== undefined) return; // Already visited
// Generate neighbors for this Node
this.node.neighbors = Array.from(this.node.generateMoves(), targetId => {
let target = this.visited.get(targetId); // Get a Node instance
if (!target) this.visited.set(targetId, target = new Node(targetId));
return target;
});
if (!this.node.neighbors.length) { // Game over!
this.node.value = 100;
return;
}
// Assign temporary value while depth-first search is ongoing.
// This will ensure that a revisit is considered a draw (repetition of moves)
this.node.value = 0; // 0 indicates a draw
let bestValue = -Infinity;
for (let target of this.node.neighbors) {
this.move(target);
this.minimax();
bestValue = Math.max(bestValue, this.node.value);
this.undo();
}
// Definite value: reduce its absolute value, so to favour quick wins over slow wins
this.node.value = bestValue && (Math.abs(bestValue) - 1) * Math.sign(-bestValue);
}
}
let game = new Game;
game.minimax(); // Create the full game tree rooted at the initial state
// I/O management
let state = document.getElementById("state");
let moves = document.getElementById("moves");
let undo = document.getElementById("undo");
let message = document.getElementById("message");
function display() {
let turn = game.hist.length % 2;
let mapper = [1, turn*2, 2 - turn*2];
for (let card = 1; card <= 8; card++) {
let owner = game.node.cardOwner(card);
let ownerRow = state.rows[mapper[owner]];
for (let row of state.rows) {
row.cells[card].textContent = row === ownerRow ? card : "";
}
}
state.rows[0].classList.toggle("turn", !turn);
state.rows[2].classList.toggle("turn", turn);
message.textContent = game.node.neighbors.length ? `Player ${turn + 1} to play. Make your choice:` : `Player ${2 - turn} won!`;
undo.disabled = !game.hist.length;
moves.innerHTML = "";
for (let target of game.node.neighbors) {
let { discard, replace } = game.node.compare(target);
let button = document.createElement("button");
button.textContent = `${discard ? `Replace ${discard} with` : "Select"} ${replace} ${
target.value < 0 ? `(lose in ${101+target.value})` :
target.value > 0 ? `(win in ${101-target.value})` : "(draw)"
}`;
moves.appendChild(button);
}
}
moves.addEventListener("click", function (e) {
let moveIndex = [...moves.children].indexOf(e.target);
if (moveIndex >= 0) game.move(game.node.neighbors[moveIndex]);
display();
});
undo.addEventListener("click", function() {
game.undo();
display();
});
display();
td { border: 1px solid; background: yellow; min-width: 1em; text-align: center }
td:empty, td:first-child { border-color: transparent; background: none }
tr:nth-child(odd) { background: lightblue }
tr.turn { background: orange }
#moves > * { display: block }
<table id="state">
<tr>
<td>Player 1: </td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td>
</tr>
<tr>
<td>Desk: </td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td>
</tr>
<tr>
<td>Player 2: </td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td>
</tr>
</table>
<div id="message"></div>
<button id="undo">Undo last move</button><br>
<div id="moves"></div>
备注
在双方发挥最佳的情况下,本场比赛为平局。如果我们考虑到第二个玩家在游戏的第一阶段必须更加小心不要犯错,那么第一个玩家有点优势。
当第一个玩家选择牌 6 或 8 时,第二个玩家 必须 拿牌 5 否则他们将输(最佳游戏)。博弈树迅速扩大,但也有其他州只有一个好棋。
发现的不同状态数(包括根状态)为 1713。该算法不考虑 30 步限制,因为这是一个“乏味”的规则。 not 有这个限制的好处是算法需要足够聪明以检测重复。有了30手的限制,就不需要进行这样的循环检查了。
Python
可以在 repl.it
上找到相同的实现(在控制台上交互)