在 python 函数中使用字符串时接受 "close matches"？

Question

我正在尝试使用最短路径函数来查找图中字符串之间的距离。问题是有时我想数数势均力敌的比赛。例如，我希望 "communication" 算作 "communications" 或 "networking device" 算作 "network device"。在 python 中有没有办法做到这一点？（例如，提取词根，或计算字符串距离，或者可能是一个 python 库，它已经具有像 plural/gerund/misspelled/etc 这样的词形式关系）我现在的问题是我的过程只在存在时有效与我的数据库中的每个项目完全匹配，很难保持清洁。

例如：

List_of_tags_in_graph = ['A', 'list', 'of', 'tags', 'in', 'graph']

given_tag = 'lists'

if min_fuzzy_string_distance_measure(given_tag, List_of_tags_in_graph) < threshold :
     index_of_min = index_of_min_fuzzy_match(given_tag, List_of_tags_in_graph)
     given_tag = List_of_tags_in_graph[index_of_min]

#... then use given_tag in the graph calculation because now I know it matches ...

有没有想过简单或快速的方法来做到这一点？或者，也许是一种不同的方式来考虑接受紧密匹配的优势……或者只是在字符串不匹配时更好地处理错误？

Answer 1

尝试使用nltk WorldNetLemmatizer，它是为提取词根而设计的。 https://www.nltk.org/_modules/nltk/stem/wordnet.html

在 python 函数中使用字符串时接受 "close matches"？

Accept "close matches" when using strings in a python functions?

python

string

nlp

fuzzy-comparison

stringdist