如何使用 python 计算文本在多个单元格中出现的次数

Question

我想找出一种方法来计算相似词在多行中出现的次数。比如'Street'出现过，'Carla'出现过两次。 (* 注意 --> 有很多这样的行，我不确定哪个词是常见的)

请帮忙

Answer 1

不确定您的数据是什么格式，但我们假设它是一个 pandas DataFrame。

首先转换为列表：

rows = df["Description"]

创建一个大列表作为所有单词的容器：

large_list = []

遍历行，用空格分隔每一行并将该行中的单词列表追加到大列表中：

for row in rows:
    large_list += row.split()

计算列表中每个元素（单词）出现的频率：

import collections
counts = collections.Counter(large_list)
print(counts)

您可能想要添加过滤器，例如单词只能包含字母（而不是数字）、停用词过滤等。

How to count number of times a text appeared in multiple cells using python