Spacy:计算每个句子中特定标记的出现次数
Spacy: count occurrence for specific token in each sentence
我想使用 spacy 计算语料库中每个句子的标记 和 的出现次数,并将每个句子的结果附加到列表中。到目前为止,代码低于 returns 关于 和 .
的总数(对于整个语料库)
Example/Desired 输出 3 个句子:['1', '0', '2']
当前输出:[3]
doc = nlp(corpus)
nb_and = []
for sent in doc.sents:
i = 0
for token in sent:
if token.text == "and":
i += 1
nb_and.append(i)
您需要在每个句子处理后将 i
附加到 nb_and
:
for sent in doc.sents:
i = 0
for token in sent:
if token.text == "and":
i += 1
nb_and.append(i)
测试代码:
import spacy
nlp = spacy.load("en_core_web_trf")
corpus = "I see a cat and a dog. None seems to be unhappy. My mother and I wanted to buy a parrot and a tortoise."
doc = nlp(corpus)
nb_and = []
for sent in doc.sents:
i = 0
for token in sent:
if token.text == "and":
i += 1
nb_and.append(i)
nb_and
# => [1, 0, 2]
我想使用 spacy 计算语料库中每个句子的标记 和 的出现次数,并将每个句子的结果附加到列表中。到目前为止,代码低于 returns 关于 和 .
的总数(对于整个语料库)Example/Desired 输出 3 个句子:['1', '0', '2'] 当前输出:[3]
doc = nlp(corpus)
nb_and = []
for sent in doc.sents:
i = 0
for token in sent:
if token.text == "and":
i += 1
nb_and.append(i)
您需要在每个句子处理后将 i
附加到 nb_and
:
for sent in doc.sents:
i = 0
for token in sent:
if token.text == "and":
i += 1
nb_and.append(i)
测试代码:
import spacy
nlp = spacy.load("en_core_web_trf")
corpus = "I see a cat and a dog. None seems to be unhappy. My mother and I wanted to buy a parrot and a tortoise."
doc = nlp(corpus)
nb_and = []
for sent in doc.sents:
i = 0
for token in sent:
if token.text == "and":
i += 1
nb_and.append(i)
nb_and
# => [1, 0, 2]