如何创建解决魔方的模式数据库？

How to create a pattern database for solving Rubik's Cube?

rubiks-cube

我正在尝试实施 Korf's algorithm 来解决 3x3x3 魔方。部分解决方案是创建模式数据库。

这是引自 paper 的引述，从字面上看，它包含了关于如何做到这一点的全部信息：

Using a breadth-first search from the goal state, we can enumerate these states, and record in a table the number of moves required to solve each combination of corner cubies.

如何在代码中转换它？由于在每一步上，我们都有多个目标状态，我不清楚我们如何才能 "enumerate" 从它可以到达的所有状态。

我已经实现了Korf的算法，你可以参考我的代码：https://github.com/benbotto/rubiks-cube-cracker/代码很多，这里就不多说了post，不过我可以给一些一般算法提示。

首先，Korf 的论文建议使用三个模式数据库，而不仅仅是一个。其中一个数据库存储解决任何立方体角块所需的移动次数。有 8 个角块，每个角块可以占据 8 个位置中的任何一个，所以有 8 个！可能的排列。每个角块都可以以 3 种不同的方式定向——例如，三个贴纸中的任何一个都可以面朝上——但是 7 个立方体的方向决定了第 8 个立方体的方向（根据立方体定律）。因此，角的方向有 3^7 种可能的方式。总共有8个！ * 3^7 种可能的立方体角点打乱方式，这 88,179,840 种状态可以在合理的时间内（30 分钟左右）迭代。所有角落状态都可以在 11 次或更少的移动中到达，因此角落模式数据库中的每个条目都可以存储在一个半字节（4 位）中。在磁盘上，角落图案数据库占用约42MB。

广度优先搜索可用于填充此数据库。应用移动并使用角的状态在数据库中创建索引。如果这个状态之前已经用较少的步数见过，那么搜索树就可以被剪枝：没有理由继续沿着分支往下走；否则，将状态添加到数据库并继续搜索。如上所述，由于在搜索过程中可以进行大量修剪，因此在现代计算机上迭代所有可能的角状态不会花费很长时间。

我的广度优先搜索算法：https://github.com/benbotto/rubiks-cube-cracker/blob/master/Controller/Searcher/BreadthFirstCubeSearcher.cpp
我的边角图案数据库：https://github.com/benbotto/rubiks-cube-cracker/blob/master/Model/PatternDatabase/Korf/CornerPatternDatabase.cpp
这里是我索引角落数据库的地方：https://github.com/benbotto/rubiks-cube-cracker/blob/master/Controller/Command/Solver/KorfCubeSolver.cpp#L30

Korf 建议另外两个数据库：一个用于 12 条边中的 6 条，另一个用于其他 6 条边。 Korf 使用的硬件有限（Sun SPARC Ultra！），但由于我使用的是更现代的机器，所以我选择在每个边缘数据库中使用 7 个边缘。这大大加快了求解器的速度。反正7条边可以占12个位置，所以有12P7（12！/（12-7）！）排列。每个角可以有两种方向，所以 7 条边有 2^7 种可能的方向。同样，这是足够少的立方体状态来迭代，并且可以在 10 次或更少的移动中达到所有状态。将每个条目存储在一个半字节中，每个 7-edge 数据库占用大约 244MB（12P7 * 2^7 / 2 字节）。

出于效率原因（效率在此代码中很重要），我使用非递归算法实现了广度优先搜索。虽然这种类型的搜索对于构建角落数据库效果很好，但内存成本对于索引边缘数据库来说太高了。因此，我使用自定义迭代加深深度优先搜索来索引边缘。 "custom" 部分是它在达到已经遇到的状态时提前退出。

我的自定义 IDDFS 实现：https://github.com/benbotto/rubiks-cube-cracker/blob/master/Controller/Searcher/PatternDatabaseIndexer.cpp
我的各种边缘模式数据库：https://github.com/benbotto/rubiks-cube-cracker/tree/master/Model/PatternDatabase/Korf

上面的磁盘数据库大小当然假设数据库只包含到达每个状态的移动次数，每个存储在一个半字节中。也就是说，数据库是一个散列 table，每个状态都是 table 中的一个索引。因此，您需要一个 "perfect hash" 算法，该算法采用多维数据集排列和 returns 索引。在他的论文中，并且他有多篇关于组合谜题的论文，Korf 对如何创建这样的哈希非常简洁。归结为计算 Lehmer codes. Korf gives a short and sweet linear algorithm in his paper, Large-Scale Parallel Breadth-First Search.

We scan the permutation from left to right, constructing a bit string of length n, indicating which elements of the permutation we’ve seen so far. Initially the string is all zeros. As each element of the permutation is encountered, we use it as an index into the bit string and set the corresponding bit to one. When we encounter element k in the permutation, to determine the number of elements less than k to its left, we need to know the number of ones in the first k bits of our bit string. We extract the first k bits by right shifting the string by n − k. This reduces the problem to: given a bit string, count the number of one bits in it.

We solve this problem in constant time by using the bit string as an index into a precomputed table, containing the number of ones in the binary representation of each index.

我花了很长时间才消化并将其转换为代码，特别是因为他没有谈论索引部分排列。在为边缘片段生成模式数据库时，您需要索引部分排列，因为创建所有 12 条边缘的数据库将非常大。因此，我在 Medium 上写了一篇关于它的文章：https://medium.com/@benjamin.botto/sequentially-indexing-permutations-a-linear-algorithm-for-computing-lexicographic-rank-a22220ffd6e3

最后，我测试了多种不同的数据结构来存储立方体。在我的代码中，我有多个求解器（Korf 和 Thisthlewaite），以及一个图形表示。我实际上将立方体存储在 4 种不同的结构中。使用 Korf 之类的算法，您用来表示魔方的结构对解算器的速度有巨大（特别强调！）的影响。我在 another post 中写了不同的结构，在我的测试中，选项 (4) 是迄今为止最快的 Korf 算法。要创建单个数据库条目，您需要每个立方体的索引和方向（例如角的 {0-7、0-2}）。因此，在创建模式数据库时，将立方体表示为索引和方向是有效的，这样就不需要额外的处理来计算它们。