反转 get_matching_blocks 来自 python 2.7 中 difflib 的结果并获得不匹配的块
Invert get_matching_blocks results from difflib in python 2.7 and get MISMATCHED blocks
下面python2.7的例子returns string1和string2的匹配块:
import difflib
string1 = "This is a test"
string2 = "This ain't a testament"
s = difflib.SequenceMatcher(lambda x: x == " ", string1, string2)
for block in s.get_matching_blocks():
a,b,size = block
print "string1[%s] and string2[%s] match for %s characters" % block
以上程序的结果如下:
string1[0] and string2[0] match for 5 characters
string1[5] and string2[6] match for 1 characters
string1[7] and string2[10] match for 7 characters
string1[14] and string2[22] match for 0 characters
我想反转结果和 return string1 和 string2 的不匹配块,如下所示:
string1[6] mismatch for 1 characters
string2[5] mismatch for 1 characters
string2[7] mismatch for 3 characters
string2[17] mismatch for 5 characters
注意:两个字符串的匹配块总数相同,但不匹配的块会因字符串而异。
这里是字符串的颜色编码表示,其中黑色=匹配,红色=不匹配。
在我看来,应该可以遍历匹配的块来计算不匹配的部分。下面粘贴了一个快速解决方案(读作 "tested only with the input in the question")。看看能不能帮你算出最终的解决方案。
注意:我现在只能使用 Python3 解释器,但由于这个问题不是 version-specific,所以我发布了这个解决方案。
import difflib
string1 = "This is a test"
string2 = "This ain't a testament"
s = difflib.SequenceMatcher(lambda x: x == " ", string1, string2)
s1_miss = list()
s2_miss = list()
s1_cur_off = 0
s2_cur_off = 0
for block in s.get_matching_blocks():
a,b,size = block
print("string1[%s] and string2[%s] match for %s characters" % block)
if a > s1_cur_off:
s1_miss.append((s1_cur_off, a-1, a-1-s1_cur_off + 1))
s1_cur_off = a + size
if b > s2_cur_off:
s2_miss.append((s2_cur_off, b-1, b-1-s2_cur_off + 1))
s2_cur_off = b + size
print(s1_miss)
print(s2_miss)
输出:
为每个字符串转储不匹配列表。列表的每个元素都有three-tuples:不匹配的开始和结束偏移量和长度(主要用于调试)。
string1[0] and string2[0] match for 5 characters
string1[5] and string2[6] match for 1 characters
string1[7] and string2[10] match for 7 characters
string1[14] and string2[22] match for 0 characters
[(6, 6, 1)]
[(5, 5, 1), (7, 9, 3), (17, 21, 5)]
下面python2.7的例子returns string1和string2的匹配块:
import difflib
string1 = "This is a test"
string2 = "This ain't a testament"
s = difflib.SequenceMatcher(lambda x: x == " ", string1, string2)
for block in s.get_matching_blocks():
a,b,size = block
print "string1[%s] and string2[%s] match for %s characters" % block
以上程序的结果如下:
string1[0] and string2[0] match for 5 characters
string1[5] and string2[6] match for 1 characters
string1[7] and string2[10] match for 7 characters
string1[14] and string2[22] match for 0 characters
我想反转结果和 return string1 和 string2 的不匹配块,如下所示:
string1[6] mismatch for 1 characters
string2[5] mismatch for 1 characters
string2[7] mismatch for 3 characters
string2[17] mismatch for 5 characters
注意:两个字符串的匹配块总数相同,但不匹配的块会因字符串而异。
这里是字符串的颜色编码表示,其中黑色=匹配,红色=不匹配。
在我看来,应该可以遍历匹配的块来计算不匹配的部分。下面粘贴了一个快速解决方案(读作 "tested only with the input in the question")。看看能不能帮你算出最终的解决方案。
注意:我现在只能使用 Python3 解释器,但由于这个问题不是 version-specific,所以我发布了这个解决方案。
import difflib
string1 = "This is a test"
string2 = "This ain't a testament"
s = difflib.SequenceMatcher(lambda x: x == " ", string1, string2)
s1_miss = list()
s2_miss = list()
s1_cur_off = 0
s2_cur_off = 0
for block in s.get_matching_blocks():
a,b,size = block
print("string1[%s] and string2[%s] match for %s characters" % block)
if a > s1_cur_off:
s1_miss.append((s1_cur_off, a-1, a-1-s1_cur_off + 1))
s1_cur_off = a + size
if b > s2_cur_off:
s2_miss.append((s2_cur_off, b-1, b-1-s2_cur_off + 1))
s2_cur_off = b + size
print(s1_miss)
print(s2_miss)
输出: 为每个字符串转储不匹配列表。列表的每个元素都有three-tuples:不匹配的开始和结束偏移量和长度(主要用于调试)。
string1[0] and string2[0] match for 5 characters
string1[5] and string2[6] match for 1 characters
string1[7] and string2[10] match for 7 characters
string1[14] and string2[22] match for 0 characters
[(6, 6, 1)]
[(5, 5, 1), (7, 9, 3), (17, 21, 5)]