如何在两个字符串中找到匹配的单词?
How to find a match of words in two strings?
我有两个字符串
My name is Bogdan
和 Bogdan and I am from Russia
我需要从这个字符串中获取单词 Bogdan
。我总是知道第一句的结尾 == 第二句的开头。
如何找到重叠部分。
我的解决方案returns 相似字符
res = list(set('My name is Bogdan').intersection(set('Bogdan and i am from Russia')))
print(res)
Returns
['i', 'n', 'g', 'm', ' ', 's', 'B', 'a', 'd', 'o']
您首先将两个字符串最大程度地重叠,然后通过减少重叠进行迭代:
def find_overlap(s1, s2):
for i in range(len(s1)):
test1, test2 = s1[i:], s2[:len(s1) - i]
if test1 == test2:
return test1
s1, s2 = "My name is Bogdan", "Bogdan and I am from Russia"
find_overlap(s1, s2)
# 'Bogdan'
s1, s2 = "mynameisbogdan", "bogdanand"
find_overlap(s1, s2)
# 'bogdan'
如您所见,如果两个字符串不包含空格,这也有效。
这具有 O(n) 运行时间,但如果您首先确定两个字符串中哪个更短,则可以减少到 O(min(n, m))。
如果您希望找到的字符串比两个字符串中最短的字符串短得多,您可以将其设置为偶数 O(k),其中 k 是要查找的字符串的长度,从最小值开始重叠:
def find_overlap(s1, s2):
for i in range(1, len(s1) + 1):
if i == len(s2):
return None
test1, test2 = s1[-i:], s2[:i]
if test1 == test2:
return test1
可以使用集合交集
l1="My name is Bogdan"
l2="Bogdan and I am from Russia"
print(set(l1.split())&set(l2.split())) # set('Bogdan')
列表理解
l1="My name is Bogdan"
l2="Bogdan and I am from Russia"
[i for i in l1.split() if i in l2.split()] ['Bogdan']
其他选项,带 for 循环:
def shared_words(s1, s2):
res = []
l_s1, l_s2 = set(s1.split()), set(s2.split())
for ss1 in l_s1:
if ss1 in l_s2: res.append(ss1)
return res
应用于字符串:
s1 = "My name is Bogdan"
s2 = "Bogdan and I am from Russia"
print(shared_words(s1, s2)) #=> ['Bogdan']
或者,使用正则表达式只拆分单词:
import re
def shared_words(s1, s2):
res = []
l_s1, l_s2 = set(re.findall(r'\w+',s1)), set(re.findall(r'\w+',s2))
for ss1 in l_s1:
if ss1 in l_s2: res.append(ss1)
return res
获得:
s1 = "My name is Bogdan, I am here"
s2 = "Bogdan and I am from Russia."
print(shared_words(s1, s2)) #=> ['Bogdan', 'I', 'am']
我有两个字符串
My name is Bogdan
和 Bogdan and I am from Russia
我需要从这个字符串中获取单词 Bogdan
。我总是知道第一句的结尾 == 第二句的开头。
如何找到重叠部分。
我的解决方案returns 相似字符
res = list(set('My name is Bogdan').intersection(set('Bogdan and i am from Russia')))
print(res)
Returns
['i', 'n', 'g', 'm', ' ', 's', 'B', 'a', 'd', 'o']
您首先将两个字符串最大程度地重叠,然后通过减少重叠进行迭代:
def find_overlap(s1, s2):
for i in range(len(s1)):
test1, test2 = s1[i:], s2[:len(s1) - i]
if test1 == test2:
return test1
s1, s2 = "My name is Bogdan", "Bogdan and I am from Russia"
find_overlap(s1, s2)
# 'Bogdan'
s1, s2 = "mynameisbogdan", "bogdanand"
find_overlap(s1, s2)
# 'bogdan'
如您所见,如果两个字符串不包含空格,这也有效。
这具有 O(n) 运行时间,但如果您首先确定两个字符串中哪个更短,则可以减少到 O(min(n, m))。
如果您希望找到的字符串比两个字符串中最短的字符串短得多,您可以将其设置为偶数 O(k),其中 k 是要查找的字符串的长度,从最小值开始重叠:
def find_overlap(s1, s2):
for i in range(1, len(s1) + 1):
if i == len(s2):
return None
test1, test2 = s1[-i:], s2[:i]
if test1 == test2:
return test1
可以使用集合交集
l1="My name is Bogdan"
l2="Bogdan and I am from Russia"
print(set(l1.split())&set(l2.split())) # set('Bogdan')
列表理解
l1="My name is Bogdan"
l2="Bogdan and I am from Russia"
[i for i in l1.split() if i in l2.split()] ['Bogdan']
其他选项,带 for 循环:
def shared_words(s1, s2):
res = []
l_s1, l_s2 = set(s1.split()), set(s2.split())
for ss1 in l_s1:
if ss1 in l_s2: res.append(ss1)
return res
应用于字符串:
s1 = "My name is Bogdan"
s2 = "Bogdan and I am from Russia"
print(shared_words(s1, s2)) #=> ['Bogdan']
或者,使用正则表达式只拆分单词:
import re
def shared_words(s1, s2):
res = []
l_s1, l_s2 = set(re.findall(r'\w+',s1)), set(re.findall(r'\w+',s2))
for ss1 in l_s1:
if ss1 in l_s2: res.append(ss1)
return res
获得:
s1 = "My name is Bogdan, I am here"
s2 = "Bogdan and I am from Russia."
print(shared_words(s1, s2)) #=> ['Bogdan', 'I', 'am']