获取每个单词在给定文本中出现的句子数

Question

我正在使用 Spacy，我正在寻找一个程序来计算文本中每个单词的频率，并输出每个单词及其出现的次数和句子编号。样本输入

Python is cool. But Ocaml is cooler since it is purely functional.

示例输出

1 Python 1
3 is 1 2
1 cool 1
1 But 2
1 Ocaml 2
1 cooler 2
1 since 2
1 it 2
1 purely 2
1 functional 2

Answer 1

我会将句子拆分成单词并创建一个字典，每个键都是文本中的一个单词，如下所示：

text = "Python is cool. But Ocaml is cooler since it is purely functional."
specialSymbols = '.,;:'
words = [[word.strip(specialSymbols) for word in sentence.split(' ')] for sentence in text.split('. ')]
d = {word: [0, []] for sentence in words for word in sentence}

for i, sentence in enumerate(words):
    for word in sentence:
        d[word][0] += 1
        if i + 1 not in d[word][1]:
            d[word][1].append(i + 1)

for key, val in d.items():
    print(f'{val[0]} {key} {" ".join([str(i) for i in val[1]])}')

获取每个单词在给定文本中出现的句子数

Get for each word the number of the sentences in which appears in a given text

python

nlp

spacy