Rabin-Karp：滚动哈希计算向先前计算的哈希添加一个大质数

Question

我想我在概念上理解了使用滚动哈希的 RabinKarp 模式匹配算法。在执行示例实现 here 时，我发现一个大素数 q 被添加到先前计算的滚动哈希中。

for (int i = m; i < n; i++) {
            // Remove leading digit, add trailing digit, check for match. 
            txtHash = (txtHash + q - RM*txt.charAt(i-m) % q) % q; //Why +q here?
            txtHash = (txtHash*R + txt.charAt(i)) % q; 

            // match
            int offset = i - m + 1;
            if ((patHash == txtHash) && check(txt, offset))
                return offset;
        }

我不确定为什么需要这样做。我可以得到一些帮助吗？

在我的有限测试中，无论是否包含 q 个术语，我都会得到相同的结果。

这是否与正在实施的算法版本 (Monte Carlo/Las Vegas) 有关？

Answer 1

+q 项用于避免处理负数。

我们希望txtHash始终位于区间[0;q[，没有这个+q它也可能位于]-q;0[.

这可能会导致丢失模式。例如，如果 patHash = 0xdead 但你计算 txtHash = -q+0xdead。这两个值在数学上是相等的 mod q 但与 Java 不同 % q.

Rabin-Karp：滚动哈希计算向先前计算的哈希添加一个大质数

Rabin-Karp: rolling hash computation adds a large prime number to previously computed hash

string

montecarlo

rabin-karp

rolling-computation