如何理解 Monte Carlo Tree Search 的 4 个步骤

How to understand the 4 steps of Monte Carlo Tree Search

来自许多博客和这个https://web.archive.org/web/20160308070346/http://mcts.ai/about/index.html 我们知道MCTS算法的过程有4个步骤。

Selection: Starting at root node R, recursively select optimal child nodes until a leaf node L is reached.

这里的叶节点L是什么意思？我想它应该是一个代表游戏结束状态的节点，或者另一个结束游戏的词。如果 L 不是终端节点（游戏的一个结束状态），我们如何确定 selection 步骤在节点 L 上停止？

Expansion: If L is a not a terminal node (i.e. it does not end the game) then create one or more child nodes and select one C.

从这个描述中我意识到显然我之前的想法是错误的。那么如果L不是终端节点，那么说明L应该有children，为什么不在"Selection"步继续从L中找一个child呢？这一步有L的children列表吗？
从这一步本身的描述来看，什么时候创建一个child个节点，什么时候需要创建多个child个节点？基于什么 rule/policy 我们 select 节点 C？

Simulation: Run a simulated playout from C until a result is achieved.

因为第一个问题的困惑，完全不能理解为什么要模拟游戏。我想从selection这一步，我们可以到达终点节点，游戏应该在这条路径的节点L上结束。我们甚至不需要做"Expansion"，因为节点L是终端节点。

Backpropagation: Update the current move sequence with the simulation result.

很好。最后一个问题，你是从哪里得到这些问题的答案的？

谢谢

顺便说一句，我也post同样的问题https://ai.stackexchange.com/questions/16606/how-to-understand-the-4-steps-of-monte-carlo-tree-search

What does leaf node L mean here?

为了便于解释，我假设所选节点的所有子节点都在算法的扩展阶段添加。

算法开始时，树只由根节点（叶节点）构成。

扩展阶段将从根可达的所有状态添加到树中。现在你有一个更大的树，叶子是最后添加的节点（根节点不再是叶子）。

在算法的任何给定迭代中，树（图片的灰色区域）都会生长。它的一些叶子可能是终端状态（根据game/problem的规则）但它不被授予。

如果扩展太多，可能会运行内存不足。因此扩展阶段的典型实现只向现有树添加一个节点。

在这种情况下，您可以将单词 leaf 更改为 not fully expanded:

Starting at root node R, recursively select optimal child nodes until a not fully expanded node L is reached

Based on what rule/policy do we select node C?

它取决于域。通常你随机选择一个move/state.

注释

图片来自 Multi-Objective Monte Carlo 实时游戏树搜索（Diego Perez、Sanaz Mostaghim、Spyridon Samothrakis、Simon M. Lucas ).

如何理解 Monte Carlo Tree Search 的 4 个步骤

How to understand the 4 steps of Monte Carlo Tree Search

algorithm

search

artificial-intelligence

monte-carlo-tree-search