python 计数器中的正则表达式匹配项

Question

我有 python Counter.

的示例代码

from collections import Counter

lst = ['item', 'itemm', 'iitem', 'foo', 'bar'] 
c = Counter(lst) 
Counter({'bar': 1, 'foo': 1, 'iitem': 1, 'item': 1, 'itemm': 1})

如果我这样做 c['item']，我会得到 1，但由于列表中的拼写错误，我想得到 3。

我尝试了以下方法，它没有给我 3 但我仍在使用它：

import re

for word in lst:
    if re.search('item',word):
        print(word,c[word])

item 1
itemm 1
iitem 1

有没有更有效的方法来做到这一点而不循环遍历列表？

Answer 1

您可以将 list_comprehension 与 sum

一起使用

>>> d = {'bar': 1, 'foo': 1, 'iitem': 1, 'item': 1, 'itemm': 1}
>>> sum([d[i] for i in d.keys() if re.search(r'item', i)])
3

或

没有正则表达式，

>>> sum([d[i] for i in d.keys() if 'item' in  i])
3

Answer 2

让我给出解决an approximate matching of strings的更多细节（这是这里的潜在问题）。

可以使用编辑距离检查（或所谓的 Levenshtein distance metric). It can be calculated using python-Levenshtein 包：

来匹配拼写错误

from Levenshtein import distance
edit_dist = distance("ah", "aho")

该示例取自 a question 上引用此特定模块的 SO。

fuzzy string matching in Python 的另一个参考。

python 计数器中的正则表达式匹配项

regex match item in python Counter

python

regex

counter

list