构建部分匹配 table 时，KMP 中第二个 while 循环的目的是什么？

Question

为 KMP 构建部分匹配 table 时：

void buildBackTable() { 
    int i = 0, j = -1; b[0] = -1; 
    while (i < m) { 
        while (j >= 0 && P[i] != P[j]) j = b[j]; //Why is this a while loop!?
        i++; j++;
        b[i] = j;
    }
}

为什么第二个 while 循环不是条件循环？如：

void buildBackTable() { 
    int i = 0, j = -1; b[0] = -1; 
    while (i < m) { 
        if (j >= 0 && P[i] != P[j]) j = b[j]; //Why not?
        i++; j++;
        b[i] = j;
    }
}

在我试过的大多数例子中，这个语句用于在没有匹配时将j重置为-1，所以当下一次迭代到来时，我们比较字符串的第一个字符（P[j ]) 与位置 i 处的字符 (P[i]).

Answer 1

首先澄清一下，在 OP 的符号中 m 代表字符串的长度 P。请注意 b 的长度为 m+1。（这是其中一种实现，尽管到目前为止在符号方面不是最自然的，这可能导致了混淆）

让我们看看 buildBackTable() 的目的是什么。本质上，对于生成的数组 b，b[i] 将存储最大值 j，这样 j < i，以及 P[0..j-1] == P[i-j..i-1]、b[0]=-1。

理解以下属性并不难：

b[i] <= b[i-1] + 1
如果b[i] = j，那么对于任何k < j，s.t P[0..k-1] == P[i-k..i-1]，我们知道b[j] >= k。同时，显然是b[i] > b[j]或者b[i] = 0。

这意味着我们可以通过 b[i] -> b[b[i]] -> b[b[b[i]]]... 等从大到小轻松枚举 k 值。

此枚举允许我们跳过 P 的前缀并找到最大值 j、s.t。 P[0..j-1] == P[i-j..i-1] 和 P[i] == P[j].

至于你的问题，当条件检查（没有循环失败）时，只考虑字符串aaaac。然后b[4] = 3。但是b[5] = 0。只有条件检查，你会得到 b[5]=3.

构建部分匹配 table 时，KMP 中第二个 while 循环的目的是什么？

What is the purpose of the second while loop in KMP when building the partial match table?

string

algorithm

computer-science

string-matching

knuth-morris-pratt