如何通过分组 Python 中的相似词来统计词数？

Question

我有一个列表

list_1 = ['warning', 'media', 'media-other','media-other','warning-type2','threat','threat-type1]

我需要计算不同类型的出现次数，如下面的字典

dict_1 = {'warning':0, 'media':0, 'threat':0}

我需要 select 相似类型并增加计数。 media和media-other应该算作media。 warning 和 warning-type2 应该算作 warning

dict_1计数后的输出应该是{'warning':2, 'media':3, 'threat':2}

Answer 1

假设任何连字符之前的部分为您提供列表中项目的 'type'，您可以使用 split 和 collections.Counter 来计算它们：

from collections import Counter
Counter(word.split("-")[0] for word in list_1)
# returns  Counter({'warning': 2, 'media': 3, 'threat': 2})

Answer 2

list_1 = ['warning', 'media', 'media-other','media-other','warning-type2','threat','threat-type1']

list_2 = [x.split('-')[0] for x in list_1]
dict_1 = {}
for key in list_2:
    if key not in dict_1.keys():
        dict_1[key] = list_2.count(key)
print(dict_1)

如何通过分组 Python 中的相似词来统计词数？

How to count words by grouping similar words in Python?

python

list-comprehension

word-count