FuzzyWuzzy 提取物中的奇怪行为
Strange behaviour in FuzzyWuzzy extract
我正在尝试使用 FuzzyWuzzy 来更正文本中拼写错误的名称。但是我无法让 process.extract 和 process.extract 一个按照我期望的方式行事。
from fuzzywuzzy import process
the_text = 'VICTOR HUGO e MARIANA VEIGA'
search_term = 'VEYGA'
the_text = the_text.split()
found_word = process.extract(search_term, the_text)
print(found_word)
这导致:
[('e', 90), ('VEIGA', 80), ('HUGO', 22), ('VICTOR', 18), ('MARIANA', 17)]
如何让 FuzzyWuzzy 将 'VEIGA' 正确识别为正确的响应?
您可以尝试使用:fuzz.token_set_ratio 或 fuzz.token_sort_ratio
这里的答案:给出了很好的解释。
为了完成这里是一些代码:
from fuzzywuzzy import process
from fuzzywuzzy import fuzz
the_text = 'VICTOR HUGO e MARIANA VEIGA'
search_term = 'VEYGA'
the_text = the_text.split()
found_word = process.extract(search_term, the_text, scorer=fuzz.token_sort_ratio)
print(found_word)
输出:
[('VEIGA', 80), ('e', 33), ('HUGO', 22), ('VICTOR', 18), ('MARIANA', 17)]
我正在尝试使用 FuzzyWuzzy 来更正文本中拼写错误的名称。但是我无法让 process.extract 和 process.extract 一个按照我期望的方式行事。
from fuzzywuzzy import process
the_text = 'VICTOR HUGO e MARIANA VEIGA'
search_term = 'VEYGA'
the_text = the_text.split()
found_word = process.extract(search_term, the_text)
print(found_word)
这导致:
[('e', 90), ('VEIGA', 80), ('HUGO', 22), ('VICTOR', 18), ('MARIANA', 17)]
如何让 FuzzyWuzzy 将 'VEIGA' 正确识别为正确的响应?
您可以尝试使用:fuzz.token_set_ratio 或 fuzz.token_sort_ratio
这里的答案:
为了完成这里是一些代码:
from fuzzywuzzy import process
from fuzzywuzzy import fuzz
the_text = 'VICTOR HUGO e MARIANA VEIGA'
search_term = 'VEYGA'
the_text = the_text.split()
found_word = process.extract(search_term, the_text, scorer=fuzz.token_sort_ratio)
print(found_word)
输出:
[('VEIGA', 80), ('e', 33), ('HUGO', 22), ('VICTOR', 18), ('MARIANA', 17)]