给定字符串 A 与长度为 |A| 的子字符串的所有汉明距离之和另一个字符串 B

Question

给定两个二进制字符串 a 和 b，求 a 和长度为 |a| 的 b 的所有连续子串之间的汉明距离之和。

输入复制：

01

00111

输出复制：

3

解释：对于第一个样例，b有四个连续的子串，长度为|a|：“00”、“01”、“11”、“11”。 “01”和“00”之间的距离为|0⟩-⟩0| + |1 - 0| =⟩1. "01"和"01"之间的距离为|0⟩-⟩0| + |1 -⟩1| =⟩0. "01" 和 "11" 之间的距离为|0⟩-⟩1| + |1 -⟩1| =⟩1. 最后的距离计数两次，因为字符串“11”出现了两次。这些编辑距离之和为 1 + 0 + 1 + 1 = 3.

在这个问题中，我只想到一个时间复杂度为 O(|a|.|b|) 的蛮力解决方案，例如字符串匹配算法...有没有更快的算法来解决这个问题

Answer 1

当您计算汉明距离之和时，可以非常快地完成：

H := sum of Hamming distances
compute array A and B such as the following:
    A[i]: the number of zeros up to i-th element of b
    B[i]: the number of ones up to i-th element of b

iterate over elements of a for i <- 0:|a|-1:
    as a[i] should be compared with all elemetns of b[i] .. b[|b|-|a|+i]
    its effect over the value of summing distances is:
        if a[i] == 0:
            H += B[|b|-|a|+i] - B[i-1]
        else: 
            H += A[|b|-|a|-1] - A[i-1]

在上面的伪代码中，B[|b|-|a|+i] - B[i-1] 表示 i 元素和 b 的第 |b|-|a|+i 元素之间的个数，对于 A[|b|-|a|-1] - A[i-1]). a 的第 i 个成员应该与这些元素进行比较以计算汉明距离之和。因此，该算法的时间复杂度为\Theta(|a| + |b|).

Answer 2

您可以在线性时间和常数内执行此操作 space。

a中的每一位将与b中的|b| - |a| + 1位进行比较，每次不匹配都会将所有汉明距离的总和加1。

此外，对于 a 的每一位，我们不需要知道它将与 b 进行比较的整个位序列。我们只需要知道它有多少个零和多少个。当我们在 a 中向前移动一位时，相应的范围在 b 中向前移动一位，我们可以很容易地在常数时间内更新这些计数。

这是 python 中的一个实现：

def HammingSum(a,b):
    # compare initial range in b to first bit in a
    range0 = len(b)-len(a)+1
    numZeros = 0
    numOnes = 0
    for i in range(range0):
        if b[i]=='0':
            numZeros += 1
        else:
            numOnes += 1
    total = numOnes if a[0]=='0' else numZeros

    #adjust the range as we compare to the other bits

    for i in range(len(b)-range0):
        #count the bit we're adding to the end of the range
        if b[range0+i]=='0':
            numZeros += 1
        else:
            numOnes += 1

        #uncount the bit we remove from the start of the range
        if b[i]=='0':
            numZeros -= 1
        else:
            numOnes -= 1

        #compare range with bit in a
        total += numOnes if a[i+1]=='0' else numZeros

    return total

给定字符串 A 与长度为 |A| 的子字符串的所有汉明距离之和另一个字符串 B

Sum of all Hamming distances of a given string A from substrings of length |A| of another string B

string

algorithm

hamming-distance