如何使用 fuzzywuzzy 从列表中提取全文？

Question

下面是我的代码：

from fuzzywuzzy import fuzz

check = open("text.txt","a")
    
MIN_MATCH_SCORE = 30
heard_word = 'i5-1135G7 '
possible_words = check

guessed_word = [word for word in possible_words if fuzz.ratio(heard_word, word) >= 
MIN_MATCH_SCORE]
print ('this one - ', guessed_word)

预期输出：

 11th Generation Intel® Core™ i5-1135G7 Processor

是否可以通过单独给出 'i5-1135G7 ' 来获得预期输出中的整个句子？有没有其他解决方案可以达到我的期望？提前谢谢你。

下面是 link for text.txt
https://drive.google.com/file/d/1Mo3qFmeOAqa3WPPyg8SpeFVSjDx7AQBj/view

Answer 1

为了抵消较长的句子并确保在单词级别重叠，您应该使用 token_set_ratio。此外，如果您想要完整的单词重叠，请将 MIN_MATCH_SCORE 增加到接近 100。

from fuzzywuzzy import fuzz

  
MIN_MATCH_SCORE = 90
heard_word = 'i5-1135G7'

possible_words = ['11th Generation Intel® Core™ i5-1135G7 Processor (2.40 GHz,up to  4.20 GHz with Turbo Boost, 4 Cores, 8 Threads, 8 MB Cache)', 
                   'windows 10 64 bit', 'intel i7']
                   
print ([word for word in possible_words 
        if fuzz.token_set_ratio(heard_word, word) >= MIN_MATCH_SCORE])

输出：

['11th Generation Intel® Core™ i5-1135G7 Processor (2.40 GHz,up to  4.20 GHz with Turbo Boost, 4 Cores, 8 Threads, 8 MB Cache)']

Answer 2

#token_set_ratio 工作正常！
从 fuzzywuzzy 导入 fuzz

s = []
for l in df1.values:
    l = ', '.join(l)
    s.append(l)

s = ', '.join(s)    
main = [x for x in g if x]
MIN_MATCH_SCORE = 60
heard_word = 'i5-11th gen'
guessed_word = [word for word in main if fuzz.token_set_ratio(heard_word, 
word) >= MIN_MATCH_SCORE]
print ('this one - ', guessed_word)

如何使用 fuzzywuzzy 从列表中提取全文？

How to extract full text from a list with fuzzywuzzy?

python

nlp

machine-learning

nltk

fuzzywuzzy