difflib 序列匹配器缺少公共子串
difflib sequence matcher missing common substrings
在尝试查找两个字符串之间的公共子串时,SequenceMatcher
没有 return 所有预期的公共子串。
s1 = '++%2F%2F+Prints+%22Hello%2C+World%22+to+the+terminal+window.%0A++++++++System.out.pr%29%3B%0A++++%7D%0A%7D%0ASample+program%0Apublic+static+voclass+id+main%28String%5B%5D+args%29+'
s2 = 'gs%29+%7B%0A++++++++%2F'
# The common substring are '+%', '%0A++++++++', '%s' and 'gs%29+'
# but 'gs%29+' is not matched.
import difflib as d
seqmatch = d.SequenceMatcher(None,s1,s2)
matches = seqmatch.get_matching_blocks()
for match in matches:
apos, bpos, matchlen = match
print(s1[apos:apos+matchlen])
输出:
+%
%0A++++++++
%2
"gs%29+"是s1
和s2
之间的公共子串,但SequenceMatcher
找不到。
我是不是漏掉了什么?
谢谢
可能是垃圾字符把算法给搞糊涂了。我在 SequenceMatcher()
中为 isjunk
添加了一个 lambda 函数
s1 = '++%2F%2F+Prints+%22Hello%2C+World%22+to+the+terminal+window.%0A++++++++System.out.pr%29%3B%0A++++%7D%0A%7D%0ASample+program%0Apublic+static+voclass+id+main%28String%5B%5D+args%29+'
s2 = 'gs%29+%7B%0A++++++++%2F'
# The expected substring is 'gs%29+'
import difflib as d
seqmatch = d.SequenceMatcher(lambda x: x in "+", s1, s2)
matches = seqmatch.get_matching_blocks()
for match in matches:
apos, bpos, matchlen = match
print(s1[apos:apos+matchlen])
现在输出
gs%29+
在尝试查找两个字符串之间的公共子串时,SequenceMatcher
没有 return 所有预期的公共子串。
s1 = '++%2F%2F+Prints+%22Hello%2C+World%22+to+the+terminal+window.%0A++++++++System.out.pr%29%3B%0A++++%7D%0A%7D%0ASample+program%0Apublic+static+voclass+id+main%28String%5B%5D+args%29+'
s2 = 'gs%29+%7B%0A++++++++%2F'
# The common substring are '+%', '%0A++++++++', '%s' and 'gs%29+'
# but 'gs%29+' is not matched.
import difflib as d
seqmatch = d.SequenceMatcher(None,s1,s2)
matches = seqmatch.get_matching_blocks()
for match in matches:
apos, bpos, matchlen = match
print(s1[apos:apos+matchlen])
输出:
+%
%0A++++++++
%2
"gs%29+"是s1
和s2
之间的公共子串,但SequenceMatcher
找不到。
我是不是漏掉了什么?
谢谢
可能是垃圾字符把算法给搞糊涂了。我在 SequenceMatcher()
isjunk
添加了一个 lambda 函数
s1 = '++%2F%2F+Prints+%22Hello%2C+World%22+to+the+terminal+window.%0A++++++++System.out.pr%29%3B%0A++++%7D%0A%7D%0ASample+program%0Apublic+static+voclass+id+main%28String%5B%5D+args%29+'
s2 = 'gs%29+%7B%0A++++++++%2F'
# The expected substring is 'gs%29+'
import difflib as d
seqmatch = d.SequenceMatcher(lambda x: x in "+", s1, s2)
matches = seqmatch.get_matching_blocks()
for match in matches:
apos, bpos, matchlen = match
print(s1[apos:apos+matchlen])
现在输出
gs%29+