计算一组单词在文本中出现的次数
Count the number of times a group of words appear in a text
我有 4 个单词列表和一个按单词分类的文本。
animals = ["cat", "dog", "fish"]
colours = ["blue", "red", "green"]
food = ["pasta", "chips", "beef"]
sport = ["football", "basketball", "tennis"]
text = ["Once","upon","a","time",.......]
我想计算这些列表中的单词在特定文本中出现的次数,但作为每个列表中单词的总和。因此,结果将显示在整个文本中出现了 10 个动物词、20 个颜色词、6 个食物词和 13 个运动词。
我实际处理的数据非常大,所以需要任何能快速运行的东西。
感谢您的帮助!
animalOccurences = 0
for word in text:
if word in animals:
animalOccurences += 1
在这里,我循环遍历 text
列表中的每个单词,并检查该单词是否在 animals
列表中。如果是,那么我将 1 添加到 animalOccurences
变量
您可以将类别更改为 dict
个 set
个对象(这将允许 O(1)
成员资格测试):
categories = {'animals': {'cat', 'dog', 'fish'},
'colours': {'blue', 'green', 'red'},
'food': {'beef', 'chips', 'pasta'},
'sport': {'basketball', 'football', 'tennis'}}
然后遍历单词并对每个类别集执行成员资格测试:
def count_words(text, categories):
counts = dict.fromkeys(categories, 0)
for word in text:
for cat_name, cat_words in categories.items():
counts[cat_name] += word in cat_words
return counts
用法:
In [19]: text = "Once upon a time there was a proper minimal reproducible example given by the OP without anybody having to ask for it".split()
In [20]: count_words(text, categories)
Out[20]: {'animals': 0, 'colours': 0, 'food': 0, 'sport': 0}
In [21]: text = ("cat dog fish "*3).split()
In [22]: count_words(text, categories)
Out[22]: {'animals': 9, 'colours': 0, 'food': 0, 'sport': 0}
我有 4 个单词列表和一个按单词分类的文本。
animals = ["cat", "dog", "fish"]
colours = ["blue", "red", "green"]
food = ["pasta", "chips", "beef"]
sport = ["football", "basketball", "tennis"]
text = ["Once","upon","a","time",.......]
我想计算这些列表中的单词在特定文本中出现的次数,但作为每个列表中单词的总和。因此,结果将显示在整个文本中出现了 10 个动物词、20 个颜色词、6 个食物词和 13 个运动词。
我实际处理的数据非常大,所以需要任何能快速运行的东西。
感谢您的帮助!
animalOccurences = 0
for word in text:
if word in animals:
animalOccurences += 1
在这里,我循环遍历 text
列表中的每个单词,并检查该单词是否在 animals
列表中。如果是,那么我将 1 添加到 animalOccurences
变量
您可以将类别更改为 dict
个 set
个对象(这将允许 O(1)
成员资格测试):
categories = {'animals': {'cat', 'dog', 'fish'},
'colours': {'blue', 'green', 'red'},
'food': {'beef', 'chips', 'pasta'},
'sport': {'basketball', 'football', 'tennis'}}
然后遍历单词并对每个类别集执行成员资格测试:
def count_words(text, categories):
counts = dict.fromkeys(categories, 0)
for word in text:
for cat_name, cat_words in categories.items():
counts[cat_name] += word in cat_words
return counts
用法:
In [19]: text = "Once upon a time there was a proper minimal reproducible example given by the OP without anybody having to ask for it".split()
In [20]: count_words(text, categories)
Out[20]: {'animals': 0, 'colours': 0, 'food': 0, 'sport': 0}
In [21]: text = ("cat dog fish "*3).split()
In [22]: count_words(text, categories)
Out[22]: {'animals': 9, 'colours': 0, 'food': 0, 'sport': 0}