优化给定字符串中单词列表的出现次数计数 (Python)
Optimizing counting occurences of a list of words in a given string (Python)
我正在创建一个函数来计算传递的字符串中 searched_words
的出现次数。结果是一个字典,其中匹配的词作为键,它们的出现作为值。
我已经创建了一个函数来实现这一点,但它的优化很差。
def get_words(string, searched_words):
words = string.split()
# O(nm) where n is length of words and m is length of searched_words
found_words = [x for x in words if x in searched_words]
# O(n^2) where n is length of found_words
words_dict = {}
for word in found_words:
words_dict[word] = found_words.count(word)
return words_dict
print(get_words('pizza pizza is very cool cool cool', ['cool', 'pizza']))
# Results in {'pizza': 2, 'cool': 3}
我尝试使用 Python 的 collections
模型中的 Counter
功能,但似乎无法重现所需的输出。似乎使用 set
数据类型也可以解决我的优化问题,但我不确定在使用集合时如何计算单词出现次数。
你认为使用 Counter
:
有一个很好的解决方案是对的
from collections import Counter
string = 'pizza pizza is very cool cool cool'
search_words = ['cool', 'pizza']
word_counts = Counter(string.split())
# If you want to get a dict only containing the counts of words in search_words:
search_word_counts = {wrd: word_counts[wrd] for wrd in search_words}
或者,您可以创建计数列表理解,然后从 zip:
中生成字典
def get_words(string, searched_words):
wordlist = string.split()
wordfreq = [wordlist.count(p) for p in searched_words]
return dict(list(zip(searched_words, wordfreq)))
这更短,并带走了额外的 for 循环,不需要额外的导入,但它需要将 dict 应用到 list 到 zip.
我正在创建一个函数来计算传递的字符串中 searched_words
的出现次数。结果是一个字典,其中匹配的词作为键,它们的出现作为值。
我已经创建了一个函数来实现这一点,但它的优化很差。
def get_words(string, searched_words):
words = string.split()
# O(nm) where n is length of words and m is length of searched_words
found_words = [x for x in words if x in searched_words]
# O(n^2) where n is length of found_words
words_dict = {}
for word in found_words:
words_dict[word] = found_words.count(word)
return words_dict
print(get_words('pizza pizza is very cool cool cool', ['cool', 'pizza']))
# Results in {'pizza': 2, 'cool': 3}
我尝试使用 Python 的 collections
模型中的 Counter
功能,但似乎无法重现所需的输出。似乎使用 set
数据类型也可以解决我的优化问题,但我不确定在使用集合时如何计算单词出现次数。
你认为使用 Counter
:
from collections import Counter
string = 'pizza pizza is very cool cool cool'
search_words = ['cool', 'pizza']
word_counts = Counter(string.split())
# If you want to get a dict only containing the counts of words in search_words:
search_word_counts = {wrd: word_counts[wrd] for wrd in search_words}
或者,您可以创建计数列表理解,然后从 zip:
中生成字典def get_words(string, searched_words):
wordlist = string.split()
wordfreq = [wordlist.count(p) for p in searched_words]
return dict(list(zip(searched_words, wordfreq)))
这更短,并带走了额外的 for 循环,不需要额外的导入,但它需要将 dict 应用到 list 到 zip.