优化给定字符串中单词列表的出现次数计数 (Python)

Optimizing counting occurences of a list of words in a given string (Python)

我正在创建一个函数来计算传递的字符串中 searched_words 的出现次数。结果是一个字典,其中匹配的词作为键,它们的出现作为值。

我已经创建了一个函数来实现这一点,但它的优化很差。

def get_words(string, searched_words):
    words = string.split()

    # O(nm) where n is length of words and m is length of searched_words
    found_words = [x for x in words if x in searched_words]

    # O(n^2) where n is length of found_words
    words_dict = {}
    for word in found_words:
        words_dict[word] = found_words.count(word)

    return words_dict


print(get_words('pizza pizza is very cool cool cool', ['cool', 'pizza']))
# Results in {'pizza': 2, 'cool': 3}

我尝试使用 Python 的 collections 模型中的 Counter 功能,但似乎无法重现所需的输出。似乎使用 set 数据类型也可以解决我的优化问题,但我不确定在使用集合时如何计算单词出现次数。

你认为使用 Counter:

有一个很好的解决方案是对的
from collections import Counter

string = 'pizza pizza is very cool cool cool'
search_words = ['cool', 'pizza']
word_counts = Counter(string.split())

# If you want to get a dict only containing the counts of words in search_words:
search_word_counts = {wrd: word_counts[wrd] for wrd in search_words}

或者,您可以创建计数列表理解,然后从 zip:

中生成字典
def get_words(string, searched_words):
    wordlist = string.split()
    wordfreq = [wordlist.count(p) for p in searched_words]
    return dict(list(zip(searched_words, wordfreq)))

这更短,并带走了额外的 for 循环,不需要额外的导入,但它需要将 dict 应用到 listzip.