为 fuzzywuzzy 设置阈值 process.extractOne

Setting a Threshold for fuzzywuzzy process.extractOne

我目前正在两个不同的零售商之间进行一些字符串产品相似性匹配,我正在使用 fuzzywuzzy process.extractOne function 找到最佳匹配。

但是,我希望能够设置一个评分阈值,以便只有当分数高于某个阈值时产品才会匹配,因为目前它只是根据最接近的字符串匹配每个产品。

以下代码给出了最佳匹配:(当前出现错误)

title, index, score = process.extractOne(text, choices_dict)

然后我尝试了以下代码来尝试设置阈值:

title, index, score = process.extractOne(text, choices_dict, score_cutoff=80)

这会导致以下类型错误:

TypeError: cannot unpack non-iterable NoneType object

最后,我也尝试了下面的代码:

title, index, scorer, score = process.extractOne(text, choices_dict, scorer=fuzz.token_sort_ratio, score_cutoff=80)

这会导致以下错误:

ValueError: not enough values to unpack (expected 4, got 3)

当最佳分数低于 score_cutoff 时,

process.extractOne 将 return None。所以你要么必须检查 None,要么捕获异常:

best_match = process.extractOne(text, choices_dict, score_cutoff=80)
if best_match:
    value, score, key = best_match
    print(f"best match is {key}:{value} with the similarity {score}")
else:
    print("no match found")

try:
    value, score, key = process.extractOne(text, choices_dict, score_cutoff=80)
    print(f"best match is {key}:{value} with the similarity {score}")
except TypeError:
    print("no match found")