理解 Grossman & Zeitman 的算法

Question

我读过论文 An inherently iterative computation of Ackermann's function，由 Grossman & Zeitman 发表，他们在其中提出了一种优化阿克曼函数的算法。

我们知道阿克曼函数产生的结果在子序列A(m,n)

m=0: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,...
m=1: 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,...
m=2: 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33,...
m=3: 5, 13, 29, 61, 125, 253, 509, 1021, 2045, 4093, 8189, 16381, 32765, 65533,...
m=4: 13, 65533,...

声明 Next 数组是 以跟踪我们在每个子序列中的位置 ，而 Goal 数组 就是在将刚刚计算的值转移到下一个子序列之前，跟踪我们需要到达的位置。它所做的只是将 1 递增到值：

Ackermann(m, n)
    {next and goal are arrays indexed from 0 to m, initialized so that next[0] 
     through next[m] are 0, goal[0] through goal[m - l] are 1, and goal[m] is -1} 
    repeat
        value ← next[0] + 1 
        transferring ← true 
        current ← 0 
        while transferring do begin
           if next[current] = goal[current] then goal[current] ← value
                                            else transferring ← false
           next[current] ← next[current] + 1
           current ← current + 1 
           end while
    until next[m] = n + 1 
    return value {the value of A(m, n)}
    end Ackermann

我很难理解这两个数组如何指示我们所在的位置/要移动的位置？当我们停在 next[m] = n + 1 时，我无法确定它到底意味着什么，为什么要特别指定这个值？我已经尝试追踪算法，但我仍然不知道它是如何工作的。该算法算作自下而上的实现吗？

这是一个 java 代码，打印值、当前和两个数组

import java.util.Arrays;

public class Ack {
    static int arrayAck(int m, int n) {
        //Next array to keep track of where we are in each subsequence, 
        //and Goal array to keep track of where we need to reach before transferring the value just calculated to the next subsequence.
        int[] next = new int[m+1];
        int[] goal = new int [m+1];
        Arrays.fill(next, 0);
        Arrays.fill(goal, 1);
        goal[m] = -1;
        
        int value = next[0] + 1;
        while(true) {
            value = next[0] + 1;
            boolean transferring = true;
            int current = 0;
            
            System.out.println("--");
            System.out.println("Next = " + Arrays.toString(next));
            System.out.println("Goal = " + Arrays.toString(goal));
            System.out.println("Current= " + current);
            System.out.println("Value = " + value);
            while(transferring) {
                if(next[current] == goal[current])
                    goal[current] = value;
                else
                    transferring = false;
                next[current]=next[current]+1;
                current += 1;
                System.out.println("Current= " + current);
                System.out.println("Next = " + Arrays.toString(next));
                System.out.println("Goal = " + Arrays.toString(goal));
            }
            
            if(next[m]==n+1)
                break;
            
        }
        
        return value;
    }
     
    public static void main(String[] args) {
        int m=2,n=2;
        System.out.println("Ackermann value for ("+m+","+n+") = " + arrayAck(m, n));
    }

}

Answer 1

我通读了这篇论文以了解他们提出的算法，当您掌握它时实际上并没有那么糟糕。

基本思路如下。我们有一个 Ackermann 函数的两个参数版本，定义如下：

A(0, n) = n + 1
A(i, 0) = A(i - 1, 1)
A(i, n) = A(i - 1, A(i, n - 1))

作者建议的方法本质上是 space 优化的、自下而上的阿克曼函数计算。与其给出他们算法的最终版本，不如看看我们是否可以从上述规则中推导出来。我们假设填充一个 2D table，其中每一行对应一个不同的 i 值，每一列对应一个 n 值。按照论文中的约定，我们将 i = 0 的行放在顶部，如下所示：

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0 
i=1
i=2
i=3

我们的目标是填写此 table。现在，不用担心 table; 的大小。我们可以稍后确定。

阿克曼函数的三个例子中的第一个说

Rule 1: A(0, n) = n + 1.

我们可以将其解释为

The top row of the table is the sequence of values 1, 2, 3, 4, 5, ...

我们现在可以填写，但出于稍后会变得更清楚的原因，现在让我们暂缓这样做。

下一条规则是

Rule 2: A(i, 0) = A(i - 1, 1).

我们可以将其解释为

The first entry in each successive row of the table is the second (index 1) entry of the row above it.

牢记这两条规则，让我们开始填写 table 个条目。我们可以使用第一条规则填充 A(0, 0) 的槽：A(0, 0) = 0 + 1 = 1.

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1
i=1
i=2
i=3

现在，让我们填写 A(0, 1) = 1 + 1 = 2:

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1    2
i=1
i=2
i=3

因为我们刚刚填写了这个 table 条目，我们可以使用下一个规则来填写 A(1, 0) 的槽位，它将由 A(0, 1) 给出= 2:

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1    2
i=1  2
i=2
i=3

为了在填充 i = 1 的行方面取得更多进展，我们需要提取阿克曼函数的第三种情况：

Rule 3: A(i, n) = A(i - 1, A(i, n - 1)).

这条规则很密集，但它的直觉非常好。要填写 table 插槽 A(i, n)，请执行以下操作：

确定 table 插槽 A(i, n - 1) 中的内容。将此数字解释为索引 k.
复制你上一行的第k项（即位置A(i - 1, k) = A(i - 1, A(i, n - 1))）。

在我们的例子中，假设我们要填写 A(1, 1)。使用上述过程，我们执行以下操作：

查看 table 插槽 A(1, 0) = 2。
从第 0 行复制索引 2（即 A(0, 2)）。

但是，查看我们的 table，我们可以看到我们还没有计算 A(0, 2)。所以让我们这样做吧。我们通过规则 1 计算 A(0, 2) = 3：

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1    2    3
i=1  2
i=2
i=3

反过来，根据规则 3，这意味着 A(1, 1) = 3:

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1    2    3
i=1  2    3
i=2
i=3

现在发生了一些有趣的事情。我们刚刚填写了 A(1, 1)，根据规则 2，我们可以通过复制 A(1, 1) 来填写 A(2, 0)：

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1    2    3
i=1  2    3
i=2  3
i=3

让我们来盘点一下刚刚发生的事情。

我们通过规则 1 计算了 A(0, 2)。
计算第 0 行的另一个条目通过规则 3 为我们提供了第 1 行的新条目。
计算第 1 行的另一个条目通过规则 2 为我们提供了第 2 行的新条目。

换句话说，通过计算第 0 行中的另一个值，我们产生了向下传播的涟漪效应，使我们能够在第 1 行、第 2 行等中计算更多条目。因此建议采取一种方法：生成第 0 行中的另一个值，使用规则 1 很容易做到，从那里看我们是否可以使用规则 2 和规则 3 在 table 中“波纹”该值更高。

要查看实际效果，让我们使用规则 1 填写 A(0, 3) = 3+1 = 4：

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1    2    3    4
i=1  2    3
i=2  3
i=3

现在，看第 1 行。规则 3 说要填写 A(1, 3)，我们需要先计算 A(1, 2)（完成 - 它是 3），然后取行的第 3 列0. 我们刚刚计算了第 0 行的第 3 列，这意味着我们应该将 A(0, 3) 复制到 A(1, 2):

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1    2    3    4
i=1  2    3    4
i=2  3
i=3

现在，看第 2 行。我们已经填写了 A(2, 0)，所以这里要计算的下一个是 A(2, 1)。为了计算 A(2, 1)，我们首先计算 A(2, 0)（完成 - 它是 3），所以我们需要从第 1 行的第 3 列中获取条目。不过我们还没有计算它，所以没有更多的事情发生。涟漪止于此。

让我们计算第 0 行的下一个条目：A(0, 4) = 4+1 = 5。这里是：

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1    2    3    4    5
i=1  2    3    4
i=2  3
i=3

查看第 1 行，填写 A(1, 3)，我们查看 A(1, 2)（即 4）并将 A(0, 4) 复制过来：

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1    2    3    4    5
i=1  2    3    4    5
i=2  3
i=3

查看第 2 行，填写 A(2, 2)，我们查看 A(2, 1)（即 3）并将 A(1, 3) 复制过来：

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1    2    3    4    5
i=1  2    3    4    5
i=2  3    5
i=3

现在，看第 3 行。规则 2 说 A(3, 0) = A(2, 1)，所以我们将 A(2, 1) 复制过来：

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1    2    3    4    5
i=1  2    3    4    5
i=2  3    5
i=3  5

涟漪是如何完成的，因为那是我们 table.

的所有行

为了看看这个系统是如何演变的，让我们运行再掀起一阵涟漪，一个接一个：

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1    2    3    4    5    6
i=1  2    3    4    5    6
i=2  3    5
i=3  5

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1    2    3    4    5    6    7
i=1  2    3    4    5    6    7
i=2  3    5    7
i=3  5

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1    2    3    4    5    6    7    8
i=1  2    3    4    5    6    7    8
i=2  3    5    7
i=3  5

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1    2    3    4    5    6    7    8    9
i=1  2    3    4    5    6    7    8    9
i=2  3    5    7    9
i=3  5

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1    2    3    4    5    6    7    8    9    10
i=1  2    3    4    5    6    7    8    9    10
i=2  3    5    7    9
i=3  5

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1    2    3    4    5    6    7    8    9    10   11
i=1  2    3    4    5    6    7    8    9    10   11
i=2  3    5    7    9   11
i=3  5

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1    2    3    4    5    6    7    8    9    10   11   12
i=1  2    3    4    5    6    7    8    9    10   11   12
i=2  3    5    7    9   11
i=3  5

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1    2    3    4    5    6    7    8    9    10   11   12   13
i=1  2    3    4    5    6    7    8    9    10   11   12   13
i=2  3    5    7    9   11   13
i=3  5   13

您可以对照论文 table 中的图 1 检查这些值，您会发现这是正在发生的事情的（部分）匹配。

执行此算法的一个重要细节是，在每个时间点，我们只关心每一行中的最后一项。该项目告诉我们在复制下一个值之前需要在我们下面的行中填充什么索引。考虑到这一点，我们可以想象运行再次使用该算法，但只存储每行中的最后一项。这可能看起来像这样：

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1
i=1
i=2
i=3

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0       2
i=1  2
i=2
i=3

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0            3
i=1       3
i=2  3
i=3

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0                 4
i=1            4
i=2  3
i=3

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0                      5
i=1                 5
i=2       5
i=3  5

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0                           6
i=1                      6
i=2       5
i=3  5

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0                                7
i=1                           7
i=2            7
i=3  5

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0                                     8
i=1                                8
i=2            7
i=3  5

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0                                          9
i=1                                     9
i=2                 9
i=3  5

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0                                               10
i=1                                          10
i=2                 9
i=3  5

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0                                                    11
i=1                                               11
i=2                      11
i=3  5

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0                                                         12
i=1                                                    12
i=2                      11
i=3  5

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0                                                              13
i=1                                                         13
i=2                           13
i=3       13

这是一个很大的 space 节省，并且它揭示了每一行的一些有趣的东西。对于每一行，我们只需要存储两个值：

存储在该行中的数字。
最右边数字所在的列索引。

这两个值有什么意义？第一个数字可以用两种方式解释。首先，它是阿克曼序列的实际值。其次，对于第一行上方的行，它是一个指示器，表示“我们需要在我下面的行中处于什么位置才能前进到我行中的下一列？” 这是纸张存储在 goal 数组中的值。

这些数字中的第二个反过来可以被认为是在说“我处于什么位置？”，然后可以使用哪些更高级别的行来确定何时推进其 goal 值。 这是纸张存储在next数组中的值。

我不会遍历论文中的伪代码，而是从头开始重写算法，最终得到与论文相似但不完全相同的东西。基本思路是模拟上述过程。我们跟踪我们在 table 的每一行中所在的列（我将其称为 positions 以使其含义更清楚）以及 [的每一行中存储的值是什么=166=]（我称之为 values）。

为了简化处理某些行中有值而其他行中没有值的情况的逻辑（例如，请参见上面跟踪中的前几个步骤），并将规则 2 和规则 3 统一在一起，我采用了以下策略。我们将扩展 Ackermann 函数，以便 A(i, -1) 的值按以下方式定义：

A(0, -1) = 0
A(i, -1) = 1

然后我们假设每一行都从位置 -1 开始，意思是“就在第一列之前”。直觉是第 1 行、第 2 行、第 3 行等都在等待一个项目出现在它们上面一行的位置 1，而在第 0 行中，第一个之前的值应该是 0，这样序列第 0 行中产生的值是系列 0、1、2、3，...。

考虑到这一点，这是我的实现：

/**
 * File: Ackermann.java
 * Author: Keith Schwarz (htiek@cs.stanford.edu)
 *
 * An implementation of a variation of Grossman and Zeitman's fast algorithm
 * for computing the Ackermann function. This implementation uses O(i) space
 * to compute A(i, n), and takes O(A(i, n)) time. The only operation required
 * on integers is "increment an integer" and "copy an integer." While this is
 * implemented using vanilla ints, it could easily be extended to use BigInteger
 * types or the like.
 */
public class Ackermann {
    public static int ackermann(int i, int n) {
        /* Bounds-check. */
        if (i < 0 || n < 0) throw new RuntimeException("Invalid arguments.");
    
        /* Positions array stores what column we're in in each row of
         * the table. Initially this is all -1 to signify that we're just
         * before the first column.
         */
        int[] positions = new int[i + 1];
        for (int j = 0; j < positions.length; j++) {
            positions[j] = -1;
        }
        
        /* Values array stores the value currently in the filled column
         * in each row. The value in the zeroth row is set to zero because
         * our algorithm works by repeatedly incrementing its value and
         * we need it to be the case that A(0, 0) = 1. Since we start off
         * with "A(0, -1)" computed, we place a zero there.
         *
         * All the other values are set to 1. This corresponds to the rule
         * that A(i, 0) = A(i - 1, 1), meaning we're waiting for column 1
         * to come up in the level below us.
         */
        int[] values = new int[i + 1];
        for (int j = 1; j < values.length; j++) {
            values[j] = 1;
        }
        
        /* Run the algorithm until we compute A(i, n), which happens when
         * positions[i] == n.
         */
        while (positions[i] != n) {        
            /* Push the value in the bottom row forward and increment it to simulate
             * computing the next value of A(0, *).
             */
            values[0]++;
            positions[0]++;
            
            /* Now, do the ripple. We continue rippling upward while
             *
             * 1. There's a layer above us to ripple to, and
             * 2. The value above us equals our current position.
             */
            for (int j = 1; j <= i && positions[j - 1] == values[j]; j++) {
                values[j] = values[j - 1]; // Copy the value
                positions[j]++;            // Shift forward a column
            }
        }
        
        return values[i];
    }

    public static void main(String[] args) {
        if (args.length != 2) {
            System.err.println("Usage: java Ackermann i n");
            return;
        }
        
        int i = Integer.parseInt(args[0]);
        int n = Integer.parseInt(args[1]);
        
        System.out.println("A(" + i + ", " + n + ") = " + ackermann(i, n));
    }
}

最后一个细节 - 我相信 Zeitman 和 Grossman 最初的运行O(iA(i, n)) 的时间分析是正确的，但并不严密。具体来说，这假设在算法的每一步中，我们都将一个值从一层复制到下一层。然而，事实并非如此。每行中的值增长得如此之快，以至于在大多数步骤中我们不会将任何内容复制到 table 的任何高行。更具体地说，请注意 table 的第 2 行由每隔一个奇数组成，因此只有一半的增量步骤会在那里复制一些东西，而上面的行肯定不会复制超过图层值的一半在他们下面。这意味着，平均而言，每个波纹将有 100% 的机会复制到第 1 行，最多有 50% 的机会复制到第 2 行，最多有 25% 的机会复制到第 3 行，等等。制作的副本并使用这里（至少）几何衰减的事实意味着“平均”操作仅复制到 O(1) 总行。这意味着我们正在执行 O(A(i, n)) 增量步骤，每个步骤（平均）仅复制到 O(1) 总行，因此完成的总工作量为 O(A( i, n))。这对运行时间限制（A(i, n) 项是巨大的！）来说并不是一个巨大的改进，但它是一个稍微好一点的结果。

理解 Grossman & Zeitman 的算法

Understanding Grossman & Zeitman's Algorithm

algorithm

optimization

ackermann