修改汉明距离/编辑距离
Modification to Hamming distance/ Edit Distance
为了以两种方式影响我的数据,我在修改汉明距离算法时遇到了问题
如果将大写字母切换为小写字母,除非它位于第一个位置,否则将汉明距离增加 .5。
示例包括:"Killer" 和 "killer" 的距离为 0 "killer" 和 "KiLler" 的汉明距离为 0.5。 "Funny" 和 FAnny" 的距离为 1.5(1 表示不同的字母,另外 .5 表示不同的大小写)。
使 b 和 d(以及它们的大写字母)被视为同一事物
这是我找到的构成基本汉明程序的代码
def hamming_distance(s1, s2):
assert len(s1) == len(s2)
return sum(ch1 != ch2 for ch1, ch2 in zip(s1, s2))
if __name__=="__main__":
a = 'mark'
b = 'Make'
print hamming_distance(a, b)
欢迎提出任何建议!
这是一个简单的解决方案。当然可以对其进行优化以获得更好的性能。
注意:我用了Python 3,因为Python 2 will retire soon.
def hamming_distance(s1, s2):
assert len(s1) == len(s2)
# b and d are interchangeable
s1 = s1.replace('b', 'd').replace('B', 'D')
s2 = s2.replace('b', 'd').replace('B', 'D')
# add 1 for each different character
hammingdist = sum(ch1 != ch2 for ch1, ch2 in zip(s1.lower(), s2.lower()))
# add .5 for each lower/upper case difference (without first letter)
for i in range(1, len(s1)):
hammingdist += 0.5 * (s1[i] >= 'a' and s1[i] <= 'z' and\
s2[i] >= 'A' and s2[i] <= 'Z' or\
s1[i] >= 'A' and s1[i] <= 'Z' and\
s2[i] >= 'a' and s2[i] <= 'z')
return hammingdist
def print_hamming_distance(s1, s2):
print("hamming distance between", s1, "and", s2, "is",
hamming_distance(s1, s2))
if __name__ == "__main__":
assert hamming_distance('mark', 'Make') == 2
assert hamming_distance('Killer', 'killer') == 0
assert hamming_distance('killer', 'KiLler') == 0.5
assert hamming_distance('bole', 'dole') == 0
print("all fine")
print_hamming_distance("organized", "orGanised")
# prints: hamming distance between organized and orGanised is 1.5
为了以两种方式影响我的数据,我在修改汉明距离算法时遇到了问题
如果将大写字母切换为小写字母,除非它位于第一个位置,否则将汉明距离增加 .5。
示例包括:"Killer" 和 "killer" 的距离为 0 "killer" 和 "KiLler" 的汉明距离为 0.5。 "Funny" 和 FAnny" 的距离为 1.5(1 表示不同的字母,另外 .5 表示不同的大小写)。使 b 和 d(以及它们的大写字母)被视为同一事物
这是我找到的构成基本汉明程序的代码
def hamming_distance(s1, s2):
assert len(s1) == len(s2)
return sum(ch1 != ch2 for ch1, ch2 in zip(s1, s2))
if __name__=="__main__":
a = 'mark'
b = 'Make'
print hamming_distance(a, b)
欢迎提出任何建议!
这是一个简单的解决方案。当然可以对其进行优化以获得更好的性能。
注意:我用了Python 3,因为Python 2 will retire soon.
def hamming_distance(s1, s2):
assert len(s1) == len(s2)
# b and d are interchangeable
s1 = s1.replace('b', 'd').replace('B', 'D')
s2 = s2.replace('b', 'd').replace('B', 'D')
# add 1 for each different character
hammingdist = sum(ch1 != ch2 for ch1, ch2 in zip(s1.lower(), s2.lower()))
# add .5 for each lower/upper case difference (without first letter)
for i in range(1, len(s1)):
hammingdist += 0.5 * (s1[i] >= 'a' and s1[i] <= 'z' and\
s2[i] >= 'A' and s2[i] <= 'Z' or\
s1[i] >= 'A' and s1[i] <= 'Z' and\
s2[i] >= 'a' and s2[i] <= 'z')
return hammingdist
def print_hamming_distance(s1, s2):
print("hamming distance between", s1, "and", s2, "is",
hamming_distance(s1, s2))
if __name__ == "__main__":
assert hamming_distance('mark', 'Make') == 2
assert hamming_distance('Killer', 'killer') == 0
assert hamming_distance('killer', 'KiLler') == 0.5
assert hamming_distance('bole', 'dole') == 0
print("all fine")
print_hamming_distance("organized", "orGanised")
# prints: hamming distance between organized and orGanised is 1.5