Python 查找字符串中的相似序列
Python find similar sequences in string
我想要一个代码 return 两个字符串中所有相似序列的总和。我写了下面的代码,但它只是 return 其中之一
from difflib import SequenceMatcher
a='Apple Banana'
b='Banana Apple'
def similar(a,b):
c = SequenceMatcher(None,a.lower(),b.lower()).get_matching_blocks()
return sum( [c[i].size if c[i].size>1 else 0 for i in range(0,len(c)) ] )
print similar(a,b)
输出将是
6
我预计是:11
当我们将您的代码编辑成这样时,它会告诉我们 6 来自哪里:
from difflib import SequenceMatcher
a='Apple Banana'
b='Banana Apple'
def similar(a,b):
c = SequenceMatcher(None,a.lower(),b.lower()).get_matching_blocks()
for block in c:
print "a[%d] and b[%d] match for %d elements" % block
print similar(a,b)
a[6] and b[0] match for 6 elements
a[12] and b[12] match for 0 elements
get_matching_blocks()
returns最长的连续匹配子序列。这里最长的匹配子序列在两个字符串中都是 'banana',长度为 6。因此它返回 6.
试试这个:
def similar(a,b):
c = 'something' # Initialize this to anything to make the while loop condition pass for the first time
sum = 0
while(len(c) != 1):
c = SequenceMatcher(lambda x: x == ' ',a.lower(),b.lower()).get_matching_blocks()
sizes = [i.size for i in c]
i = sizes.index(max(sizes))
sum += max(sizes)
a = a[0:c[i].a] + a[c[i].a + c[i].size:]
b = b[0:c[i].b] + b[c[i].b + c[i].size:]
return sum
这 "subtracts" 字符串的匹配部分,并再次匹配它们,直到 len(c)
为 1,这将在没有更多匹配剩余时发生。
但是,此脚本不会忽略空格。为了做到这一点,我使用了 this other SO answer 的建议:在将字符串传递给函数之前只对字符串进行预处理:
a = 'Apple Banana'.replace(' ', '')
b = 'Banana Apple'.replace(' ', '')
您也可以将此部分包含在函数中。
我对你的代码做了一点小改动,效果非常好,谢谢@Antimony
def similar(a,b):
a=a.replace(' ', '')
b=b.replace(' ', '')
c = 'something' # Initialize this to anything to make the while loop condition pass for the first time
sum = 0
i = 2
while(len(c) != 1):
c = SequenceMatcher(lambda x: x == ' ',a.lower(),b.lower()).get_matching_blocks()
sizes = [i.size for i in c]
i = sizes.index(max(sizes))
sum += max(sizes)
a = a[0:c[i].a] + a[c[i].a + c[i].size:]
b = b[0:c[i].b] + b[c[i].b + c[i].size:]
return sum
我想要一个代码 return 两个字符串中所有相似序列的总和。我写了下面的代码,但它只是 return 其中之一
from difflib import SequenceMatcher
a='Apple Banana'
b='Banana Apple'
def similar(a,b):
c = SequenceMatcher(None,a.lower(),b.lower()).get_matching_blocks()
return sum( [c[i].size if c[i].size>1 else 0 for i in range(0,len(c)) ] )
print similar(a,b)
输出将是
6
我预计是:11
当我们将您的代码编辑成这样时,它会告诉我们 6 来自哪里:
from difflib import SequenceMatcher
a='Apple Banana'
b='Banana Apple'
def similar(a,b):
c = SequenceMatcher(None,a.lower(),b.lower()).get_matching_blocks()
for block in c:
print "a[%d] and b[%d] match for %d elements" % block
print similar(a,b)
a[6] and b[0] match for 6 elements
a[12] and b[12] match for 0 elements
get_matching_blocks()
returns最长的连续匹配子序列。这里最长的匹配子序列在两个字符串中都是 'banana',长度为 6。因此它返回 6.
试试这个:
def similar(a,b):
c = 'something' # Initialize this to anything to make the while loop condition pass for the first time
sum = 0
while(len(c) != 1):
c = SequenceMatcher(lambda x: x == ' ',a.lower(),b.lower()).get_matching_blocks()
sizes = [i.size for i in c]
i = sizes.index(max(sizes))
sum += max(sizes)
a = a[0:c[i].a] + a[c[i].a + c[i].size:]
b = b[0:c[i].b] + b[c[i].b + c[i].size:]
return sum
这 "subtracts" 字符串的匹配部分,并再次匹配它们,直到 len(c)
为 1,这将在没有更多匹配剩余时发生。
但是,此脚本不会忽略空格。为了做到这一点,我使用了 this other SO answer 的建议:在将字符串传递给函数之前只对字符串进行预处理:
a = 'Apple Banana'.replace(' ', '')
b = 'Banana Apple'.replace(' ', '')
您也可以将此部分包含在函数中。
我对你的代码做了一点小改动,效果非常好,谢谢@Antimony
def similar(a,b):
a=a.replace(' ', '')
b=b.replace(' ', '')
c = 'something' # Initialize this to anything to make the while loop condition pass for the first time
sum = 0
i = 2
while(len(c) != 1):
c = SequenceMatcher(lambda x: x == ' ',a.lower(),b.lower()).get_matching_blocks()
sizes = [i.size for i in c]
i = sizes.index(max(sizes))
sum += max(sizes)
a = a[0:c[i].a] + a[c[i].a + c[i].size:]
b = b[0:c[i].b] + b[c[i].b + c[i].size:]
return sum