Python 匹配字符串中具有相同索引的单词

Python matching words with same index in string

我有两个等长的字符串,想匹配具有相同索引的单词。我也在尝试匹配连续的比赛,这是我遇到麻烦的地方。

例如我有两个字符串

alligned1 = 'I am going to go to some show'
alligned2 = 'I am not going to go the show'

我要找的是得到结果:

['I am','show']

我目前的代码如下:

keys = []
for x in alligned1.split():
    for i in alligned2.split():
        if x == i:
            keys.append(x)

这给了我:

['I','am','show']

如有任何指导或帮助,我们将不胜感激。

找到匹配的词相当简单,但将它们放在连续的组中却相当棘手。我建议使用 groupby.

import itertools

alligned1 = 'I am going to go to some show'
alligned2 = 'I am not going to go the show'

results = []
word_pairs = zip(alligned1.split(), alligned2.split())
for k, v in itertools.groupby(word_pairs, key = lambda pair: pair[0] == pair[1]):
    if k: 
        words = [pair[0] for pair in v]
        results.append(" ".join(words))

print results

结果:

['I am', 'show']

代码的简化为:

alligned1 = 'I am going to go to some show'
alligned2 = 'I am not going to go the show'

keys = []
for i, word in enumerate(alligned1.split()): 
    if word == alligned2.split()[i]:
        keys.append(word)

然后我们需要跟踪我们是否刚刚匹配了一个词,让我们用一个标志变量来做。

alligned1 = 'I am going to go to some show'
alligned2 = 'I am not going to go the show'

keys = []
prev = ''
for i, word in enumerate(alligned1.split()): 
    if word == alligned2.split()[i]:
        prev = prev + ' ' + word if prev else word

    elif prev:
        keys.append(prev)
        prev = ''

也许不是很优雅,但它确实有效:

from itertools import izip_longest

alligned1 = 'I am going to go to some show'
alligned2 = 'I am not going to go the show'

curr_match = ''
matches = []
for w1, w2 in izip_longest(alligned1.split(), alligned2.split()):
    if w1 != w2:
        if curr_match:
            matches.append(curr_match)
            curr_match = ''
        continue
    if curr_match:
        curr_match += ' '
    curr_match += w1
if curr_match:
    matches.append(curr_match)

print matches

结果:

['I am', 'show']

好吧 是最好的,而且恰到好处。我试着用蛮力的方式来做。它看起来不太好,但可以完成工作,没有任何 imports

alligned1 = 'I am going to go to some show'.split(' ')
alligned2 = 'I am not going to go the show'.split(' ')
keys = []
temp = [v if v==alligned1[i] else None for i,v in enumerate(alligned2) ]
temp.append(None)
tmpstr = ''
for i in temp:
    if i:
        tmpstr+=i+' '
    else:
        if tmpstr: keys.append(tmpstr)
        tmpstr = ''
keys =  [i.strip() for i in keys]
print keys

输出

['I am', 'show']