获取每个单词在给定文本中出现的句子数
Get for each word the number of the sentences in which appears in a given text
我正在使用 Spacy,我正在寻找一个程序来计算文本中每个单词的频率,并输出每个单词及其出现的次数和句子编号。
样本输入
Python is cool. But Ocaml is cooler since it is purely functional.
示例输出
1 Python 1
3 is 1 2
1 cool 1
1 But 2
1 Ocaml 2
1 cooler 2
1 since 2
1 it 2
1 purely 2
1 functional 2
我会将句子拆分成单词并创建一个字典,每个键都是文本中的一个单词,如下所示:
text = "Python is cool. But Ocaml is cooler since it is purely functional."
specialSymbols = '.,;:'
words = [[word.strip(specialSymbols) for word in sentence.split(' ')] for sentence in text.split('. ')]
d = {word: [0, []] for sentence in words for word in sentence}
for i, sentence in enumerate(words):
for word in sentence:
d[word][0] += 1
if i + 1 not in d[word][1]:
d[word][1].append(i + 1)
for key, val in d.items():
print(f'{val[0]} {key} {" ".join([str(i) for i in val[1]])}')
我正在使用 Spacy,我正在寻找一个程序来计算文本中每个单词的频率,并输出每个单词及其出现的次数和句子编号。 样本输入
Python is cool. But Ocaml is cooler since it is purely functional.
示例输出
1 Python 1
3 is 1 2
1 cool 1
1 But 2
1 Ocaml 2
1 cooler 2
1 since 2
1 it 2
1 purely 2
1 functional 2
我会将句子拆分成单词并创建一个字典,每个键都是文本中的一个单词,如下所示:
text = "Python is cool. But Ocaml is cooler since it is purely functional."
specialSymbols = '.,;:'
words = [[word.strip(specialSymbols) for word in sentence.split(' ')] for sentence in text.split('. ')]
d = {word: [0, []] for sentence in words for word in sentence}
for i, sentence in enumerate(words):
for word in sentence:
d[word][0] += 1
if i + 1 not in d[word][1]:
d[word][1].append(i + 1)
for key, val in d.items():
print(f'{val[0]} {key} {" ".join([str(i) for i in val[1]])}')