Python difflib:未检测到更改
Python difflib: not detecting changes
import difflib
test1 = ")\n )"
test2 = "#)\n #)"
d = difflib.Differ()
diff = d.compare(test1.splitlines(), test2.splitlines())
print "\n".join(diff)
输出:
- )
+ #)
- )
+ #)
? +
如您所见,它没有检测到第一行的变化(没有 ?
行)但它在第二行检测到了变化
有谁知道为什么 difflib 认为它是 delete/add 而不是改变?
一个字符的字符串是一种边缘情况。对于两个或多个字符,插入一个字符总是被正确检测到。这是一个简单的算法来证明:
import difflib
def show_diffs(limit):
characters = 'abcdefghijklmnopqrstuvwxyz'
differ = difflib.Differ()
for length in range(1, limit + 1):
for pos in range(0, length + 1):
line_a = characters[:length]
line_b = line_a[:pos] + 'A' + line_a[pos:]
diff = list(differ.compare([line_a], [line_b]))
if len(diff) == 2 and diff[0][0] == '-' and diff[1][0] == '+':
marker = 'N' # Insertion not detected
elif len(diff) == 3 and diff[0][0] == '-' and diff[1][0] == '+' and diff[2][0] == '?':
marker = 'Y' # Insertion detected
else:
print('ERROR: unexpected diff for %r -> %r:\n%r' % (line_a, line_b, diff))
return
print('%s %r -> %r' % (marker, line_a, line_b))
show_diffs(limit=3)
它"fails"只适用于1个字符的字符串:
N 'a' -> 'Aa'
N 'a' -> 'aA'
Y 'ab' -> 'Aab'
Y 'ab' -> 'aAb'
Y 'ab' -> 'abA'
Y 'abc' -> 'Aabc'
Y 'abc' -> 'aAbc'
Y 'abc' -> 'abAc'
Y 'abc' -> 'abcA'
import difflib
test1 = ")\n )"
test2 = "#)\n #)"
d = difflib.Differ()
diff = d.compare(test1.splitlines(), test2.splitlines())
print "\n".join(diff)
输出:
- )
+ #)
- )
+ #)
? +
如您所见,它没有检测到第一行的变化(没有 ?
行)但它在第二行检测到了变化
有谁知道为什么 difflib 认为它是 delete/add 而不是改变?
一个字符的字符串是一种边缘情况。对于两个或多个字符,插入一个字符总是被正确检测到。这是一个简单的算法来证明:
import difflib
def show_diffs(limit):
characters = 'abcdefghijklmnopqrstuvwxyz'
differ = difflib.Differ()
for length in range(1, limit + 1):
for pos in range(0, length + 1):
line_a = characters[:length]
line_b = line_a[:pos] + 'A' + line_a[pos:]
diff = list(differ.compare([line_a], [line_b]))
if len(diff) == 2 and diff[0][0] == '-' and diff[1][0] == '+':
marker = 'N' # Insertion not detected
elif len(diff) == 3 and diff[0][0] == '-' and diff[1][0] == '+' and diff[2][0] == '?':
marker = 'Y' # Insertion detected
else:
print('ERROR: unexpected diff for %r -> %r:\n%r' % (line_a, line_b, diff))
return
print('%s %r -> %r' % (marker, line_a, line_b))
show_diffs(limit=3)
它"fails"只适用于1个字符的字符串:
N 'a' -> 'Aa'
N 'a' -> 'aA'
Y 'ab' -> 'Aab'
Y 'ab' -> 'aAb'
Y 'ab' -> 'abA'
Y 'abc' -> 'Aabc'
Y 'abc' -> 'aAbc'
Y 'abc' -> 'abAc'
Y 'abc' -> 'abcA'