获取列表中最匹配的句子
Get the most matching sentence in a list
如何在另一个句子中找到最匹配的句子?
matchSentence = ["weather in", "weather at", "weather on"]
sentence = "weather on monday"
for item in matchSentence:
''' here get the index of the `item`
if all the words are in the `item` is in the `sentence`
'''
我正在寻找一个函数来检查 sentence
中是否存在所有单词。
Desired result is: 2
matchSentence = ["weather in", "weather at", "weather on"]
sentence = "weather on monday"
maxCount = 0
maxCntInd = -1
words1 = sentence.split() # list of all words in sentence
wordSet1 = set(words1)
for item in matchSentence:
''' here get the index of the `item`
if all the words are in the `item.split()` is in the `sentence`
'''
words2 = item.split() # list of all words in item
wordSet2 = set(words2)
commonWords = len(wordSet2.intersection(wordSet1))
if commonWords >= maxCount:
maxCount = commonWords
maxCntInd = matchSentence.index(item)
print(maxCntInd)
您可以使用 in
运算符:
matchSentence = ["weather in", "weather at", "weather on"]
sentence = "weather on monday"
for item in matchSentence:
if item in sentence:
print(matchSentence.index(item))
输出:
2
但是在很多情况下是行不通的,比如
matchSentence = ["weather's on", "weather is very hot at", "leather on"]
sentence = "weather on monday"
您可以使用模块 difflib
来应对这种情况:
第 1 轮:
from difflib import SequenceMatcher
print(SequenceMatcher(None, "abc", "abc").ratio())
输出:
1
第 2 轮:
from difflib import SequenceMatcher
print(SequenceMatcher(None, "efg", "abc").ratio())
输出:
0
如您所见,1
表示最相似(相同),0
最不相似(完全没有共同字符)。
找到最相似句子的一种方法是计算每个单词在目标句子中出现的次数。
matchSentence = ["weather in", "weather at", "weather on"]
targetSentence = "weather on monday"
targetSentence_words = targetSentence.split(" ")
mostSimilarSentence = matchSentence[0]
mostSimilarSentenceScore = 0
for searchSentence in matchSentence:
similarityScore = 0
for word in searchSentence.split(" "):
if word in targetSentence_words:
similarityScore += 1
print(f"Sentence: '{searchSentence}' got score: {similarityScore}")
if similarityScore > mostSimilarSentenceScore:
mostSimilarSentence = searchSentence
mostSimilarSentenceScore = similarityScore
print(f"Most similar sentence: {mostSimilarSentence}")
print(f"Most similar sentence score: {mostSimilarSentenceScore}")