通过与另一个列表匹配从列表中检索最长的匹配值 [Python 2.7]
Retrieve the longest matching value from a list by matching with another list [Python 2.7]
有两个列表要匹配,li_a
是给定的列表,由一个句子的字符序列组成,而li_b
是单词的集合。
li_a = ['T','h','e','s','e','45','a','r','e','c','a','r','s']
li_b = ['T','Th','The','Thes','These','a','ar','are','c','ca','car','cars']
该过程是用 li_b
项迭代匹配 li_a
项。如果 li_a
的第一个字符与 li_b
项相似,则 li_a
的第一个字符与下一个字符连接,并重做该过程,直到达到最长匹配。然后,应该拆分最长的任期,这个过程将一直持续到最后。由于 li_a
中未出现在 li_b
中的未知字符和单词将按原样附加。
最后的作品应该是这样的:
new_li = ['These','45','are','cars']
到目前为止的尝试,但这适用于 两个字符串 不适用于 Lists,并且它不会检索未识别的单词。
def longest_matched_substring(s1, s2):
m = [[0] * (1 + len(s2)) for i in xrange(1 + len(s1))]
longest, x_longest = 0, 0
for x in xrange(1, 1 + len(s1)):
for y in xrange(1, 1 + len(s2)):
if s1[x - 1] == s2[y - 1]:
m[x][y] = m[x - 1][y - 1] + 1
if m[x][y] > longest:
longest = m[x][y]
x_longest = x
else:
m[x][y] = 0
return s1[x_longest - longest: x_longest]
您可以使用两个 for loops
和一个临时 variable
来实现,如下所示:
def longest_matched_substring(li1, li2):
new_li = []
tmp = ''
for a in li1:
tmp += a
count = 0
for b in li2:
if tmp == b:
count += 1
if count == 0:
tmp1 = tmp.replace(a, '')
new_li.append(tmp1)
tmp = a
if li2.__contains__(tmp):
new_li.append(tmp)
return new_li
输入:
li_a = ['T','h','e','s','e','45','a','r','e','c','a','r','s']
li_b = ['T','Th','The','Thes','These','a','ar','are','c','ca','car','cars']
print longest_matched_substring(li_a, li_b)
输出:
['These', '45', 'are', 'cars']
对于新的场景,您可以修改函数如下:
def longest_matched_substring(li1, li2):
new_li = []
tmp = ''
for a in li1:
tmp += a
count = 0
for b in li2:
if tmp == b:
count += 1
if count == 0:
tmp1 = tmp.replace(a, '')
new_li.append(tmp1)
tmp = a
if li_b.__contains__(tmp):
new_li.append(tmp)
for e1 in new_li:
tmp2 = e1
rm = []
for e2 in new_li:
if e1 != e2:
tmp2 += e2
rm.append(e2)
if tmp2 in li2:
new_li.insert(new_li.index(e1), tmp2) # if order matters
#new_li.append(tmp2) if order doesn't matter
for r in rm:
new_li.remove(r)
new_li.remove(e1)
rm = []
break
return new_li
输入:
li_a = ['T','h','e','s','e','45','a','r','e','c','a','r','s']
li_b = ['T','Th','The','These','a','ar','are','c','ca','car','cars']
print longest_matched_substring(li_a, li_b)
输出:
['These', '45', 'are', 'cars']
有两个列表要匹配,li_a
是给定的列表,由一个句子的字符序列组成,而li_b
是单词的集合。
li_a = ['T','h','e','s','e','45','a','r','e','c','a','r','s']
li_b = ['T','Th','The','Thes','These','a','ar','are','c','ca','car','cars']
该过程是用 li_b
项迭代匹配 li_a
项。如果 li_a
的第一个字符与 li_b
项相似,则 li_a
的第一个字符与下一个字符连接,并重做该过程,直到达到最长匹配。然后,应该拆分最长的任期,这个过程将一直持续到最后。由于 li_a
中未出现在 li_b
中的未知字符和单词将按原样附加。
最后的作品应该是这样的:
new_li = ['These','45','are','cars']
到目前为止的尝试,但这适用于 两个字符串 不适用于 Lists,并且它不会检索未识别的单词。
def longest_matched_substring(s1, s2):
m = [[0] * (1 + len(s2)) for i in xrange(1 + len(s1))]
longest, x_longest = 0, 0
for x in xrange(1, 1 + len(s1)):
for y in xrange(1, 1 + len(s2)):
if s1[x - 1] == s2[y - 1]:
m[x][y] = m[x - 1][y - 1] + 1
if m[x][y] > longest:
longest = m[x][y]
x_longest = x
else:
m[x][y] = 0
return s1[x_longest - longest: x_longest]
您可以使用两个 for loops
和一个临时 variable
来实现,如下所示:
def longest_matched_substring(li1, li2):
new_li = []
tmp = ''
for a in li1:
tmp += a
count = 0
for b in li2:
if tmp == b:
count += 1
if count == 0:
tmp1 = tmp.replace(a, '')
new_li.append(tmp1)
tmp = a
if li2.__contains__(tmp):
new_li.append(tmp)
return new_li
输入:
li_a = ['T','h','e','s','e','45','a','r','e','c','a','r','s']
li_b = ['T','Th','The','Thes','These','a','ar','are','c','ca','car','cars']
print longest_matched_substring(li_a, li_b)
输出:
['These', '45', 'are', 'cars']
对于新的场景,您可以修改函数如下:
def longest_matched_substring(li1, li2):
new_li = []
tmp = ''
for a in li1:
tmp += a
count = 0
for b in li2:
if tmp == b:
count += 1
if count == 0:
tmp1 = tmp.replace(a, '')
new_li.append(tmp1)
tmp = a
if li_b.__contains__(tmp):
new_li.append(tmp)
for e1 in new_li:
tmp2 = e1
rm = []
for e2 in new_li:
if e1 != e2:
tmp2 += e2
rm.append(e2)
if tmp2 in li2:
new_li.insert(new_li.index(e1), tmp2) # if order matters
#new_li.append(tmp2) if order doesn't matter
for r in rm:
new_li.remove(r)
new_li.remove(e1)
rm = []
break
return new_li
输入:
li_a = ['T','h','e','s','e','45','a','r','e','c','a','r','s']
li_b = ['T','Th','The','These','a','ar','are','c','ca','car','cars']
print longest_matched_substring(li_a, li_b)
输出:
['These', '45', 'are', 'cars']