当在其中一个字符串中插入空格时匹配两个字符串中的元素

Question

我有大量的字符串对，例如：

s1 = 'newyork city lights are yellow'
s2 = ' the city of new york is large'

我想编写一个函数来获取 s1 和 s2（无论顺序如何）并输出：

s1_output = 'new york city lights are yellow'
s2_output = 'the city of new york is large'

使得s2中的newyork分离为new york 或者至少，一种查找与第二个字符串中的其他元素匹配的元素的方法，只需插入一个字符。

匹配的标记是事先不知道的，在文本中不是强制性的有什么想法吗？

Answer 1

像这样的东西可以工作

s1 = 'newyork city lights are yellow'
s2 = ' the city of new york is large'

# Get rid of leading/trailing whitespace
s1 = s1.strip()
# Split string into list of words, delimeter is ' ' by default
words_s1 = s1.split()

s2 = s2.strip()
words_s2 = s2.split()

# For each word in list 1, compare it to adjacent (concatenated) words in list 2
for word in words_s1:
    for i in range(len(words_s2)-1):
        if word == words_s2[i] + words_s2[i+1]:
            print(f"Word #{words_s1.index(word)} in s1 matches words #{i} and #{i+1} in s2")

它可以按照您描述的方式匹配单词。基本上，您的想法是遍历列表 1 并对照列表 2 中的相邻单词进行检查。

您也可以然后以相反的方式循环（循环通过 s2 并检查它是否等于 s1 中的相邻单词），以检查两个方向。

您需要跟踪匹配项的位置，然后您只需要使用该信息构建一个新字符串。

当在其中一个字符串中插入空格时匹配两个字符串中的元素

match elements in two strings when whitespace inserted in one of them

python

string

nlp

string-matching