修改汉明距离/编辑距离

Modification to Hamming distance/ Edit Distance

为了以两种方式影响我的数据,我在修改汉明距离算法时遇到了问题

  1. 如果将大写字母切换为小写字母,除非它位于第一个位置,否则将汉明距离增加 .5。
    示例包括:"Killer" 和 "killer" 的距离为 0 "killer" 和 "KiLler" 的汉明距离为 0.5。 "Funny" 和 FAnny" 的距离为 1.5(1 表示不同的字母,另外 .5 表示不同的大小写)。

  2. 使 b 和 d(以及它们的大写字母)被视为同一事物

这是我找到的构成基本汉明程序的代码

def hamming_distance(s1, s2):
    assert len(s1) == len(s2)
    return sum(ch1 != ch2 for ch1, ch2 in zip(s1, s2))

if __name__=="__main__":
    a = 'mark'
    b = 'Make'
    print hamming_distance(a, b) 

欢迎提出任何建议!

这是一个简单的解决方案。当然可以对其进行优化以获得更好的性能。

注意:我用了Python 3,因为Python 2 will retire soon.

def hamming_distance(s1, s2):
    assert len(s1) == len(s2)
    # b and d are interchangeable
    s1 = s1.replace('b', 'd').replace('B', 'D')
    s2 = s2.replace('b', 'd').replace('B', 'D')
    # add 1 for each different character
    hammingdist = sum(ch1 != ch2 for ch1, ch2 in zip(s1.lower(), s2.lower()))
    # add .5 for each lower/upper case difference (without first letter)
    for i in range(1, len(s1)):
        hammingdist += 0.5 * (s1[i] >= 'a' and s1[i] <= 'z' and\
                              s2[i] >= 'A' and s2[i] <= 'Z' or\
                              s1[i] >= 'A' and s1[i] <= 'Z' and\
                              s2[i] >= 'a' and s2[i] <= 'z')
    return hammingdist

def print_hamming_distance(s1, s2):
    print("hamming distance between", s1, "and", s2, "is",
          hamming_distance(s1, s2))

if __name__ == "__main__":
    assert hamming_distance('mark', 'Make') == 2
    assert hamming_distance('Killer', 'killer') == 0
    assert hamming_distance('killer', 'KiLler') == 0.5
    assert hamming_distance('bole', 'dole') == 0
    print("all fine")
    print_hamming_distance("organized", "orGanised")
    # prints: hamming distance between organized and orGanised is 1.5