在 Python 中从左到右匹配两个包含相同单词的字符串

Matching two string that contain same words from left to right in Python

我正在尝试找到一种方法来匹配两个字符串,看看它们在 python 中是否匹配或相似。

示例:

from fuzzywuzzy import fuzz

string1 = 'Green apple'
string2 = 'Apple, green' 
string3 = 'Green apples - grow on trees'

#Test with Fuzzy Wuzzy
print(fuzz.partial_ratio(string1, string2))
> 50
print(fuzz.partial_ratio(string1, string3))
> 100
print(fuzz.partial_ratio(string2, string3))
> 58

#Testing with DiffLib SequenceMatcher
print(difflib.SequenceMatcher(None, string1, string2).ratio())
> 0.34782608695652173
print(difflib.SequenceMatcher(None, string1, string3).ratio())
> 0.5641025641025641
print(difflib.SequenceMatcher(None, string2, string3).ratio())
> 0.45

在上面的示例中,所有三个字符串应该相似,因为它们都包含相同的单词 green apple。是否有任何匹配算法可以匹配包含相同单词的字符串而不考虑顺序并从左到右匹配并忽略在找到匹配项之后出现的单词,如字符串 1 和字符串 3.

fuzzywuzzy 中还有一个方法叫做 partial_token_set_ratio。我想这会解决你的问题

from fuzzywuzzy import fuzz
string1 = 'Green apple'
string2 = 'Apple, green' 
string3 = 'Green apples - grow on trees'
fuzz.partial_token_set_ratio(string1,string3)
100
fuzz.partial_token_set_ratio(string1,string2)
100
string4="apple"
fuzz.partial_token_set_ratio(string1,string4)
100
fuzz.partial_token_set_ratio(string4,string1)
100
string4="app"
fuzz.partial_token_set_ratio(string4,string1)
100
string4="appld"
fuzz.partial_token_set_ratio(string4,string1)
80