百分比计数动词,名词使用 Spacy?
Percentage Count Verb, Noun using Spacy?
我想使用 spacy 计算句子中 POS 的百分比拆分,类似于
Count verbs, nouns, and other parts of speech with python's NLTK
目前能够检测和统计 POS。如何找到百分比拆分。
from __future__ import unicode_literals
import spacy,en_core_web_sm
from collections import Counter
nlp = en_core_web_sm.load()
print Counter(([token.pos_ for token in nlp('The cat sat on the mat.')]))
当前输出:
Counter({u'NOUN': 2, u'DET': 2, u'VERB': 1, u'ADP': 1, u'PUNCT': 1})
预期输出:
Noun: 28.5%
DET: 28.5%
VERB: 14.28%
ADP: 14.28%
PUNCT: 14.28%
如何将输出写入 pandas 数据帧?
按照这些思路应该可以满足您的需求:
sbase = sum(c.values())
for el, cnt in c.items():
print(el, '{0:2.2f}%'.format((100.0* cnt)/sbase))
NOUN 28.57%
DET 28.57%
VERB 14.29%
ADP 14.29%
PUNCT 14.29%
from __future__ import unicode_literals
import spacy,en_core_web_sm
from collections import Counter
nlp = en_core_web_sm.load()
c = Counter(([token.pos_ for token in nlp('The cat sat on the mat.')]))
sbase = sum(c.values())
for el, cnt in c.items():
print(el, '{0:2.2f}%'.format((100.0* cnt)/sbase))
输出:
(u'NOUN', u'28.57%')
(u'VERB', u'14.29%')
(u'DET', u'28.57%')
(u'ADP', u'14.29%')
(u'PUNCT', u'14.29%')
我想使用 spacy 计算句子中 POS 的百分比拆分,类似于
Count verbs, nouns, and other parts of speech with python's NLTK
目前能够检测和统计 POS。如何找到百分比拆分。
from __future__ import unicode_literals
import spacy,en_core_web_sm
from collections import Counter
nlp = en_core_web_sm.load()
print Counter(([token.pos_ for token in nlp('The cat sat on the mat.')]))
当前输出:
Counter({u'NOUN': 2, u'DET': 2, u'VERB': 1, u'ADP': 1, u'PUNCT': 1})
预期输出:
Noun: 28.5%
DET: 28.5%
VERB: 14.28%
ADP: 14.28%
PUNCT: 14.28%
如何将输出写入 pandas 数据帧?
按照这些思路应该可以满足您的需求:
sbase = sum(c.values())
for el, cnt in c.items():
print(el, '{0:2.2f}%'.format((100.0* cnt)/sbase))
NOUN 28.57%
DET 28.57%
VERB 14.29%
ADP 14.29%
PUNCT 14.29%
from __future__ import unicode_literals
import spacy,en_core_web_sm
from collections import Counter
nlp = en_core_web_sm.load()
c = Counter(([token.pos_ for token in nlp('The cat sat on the mat.')]))
sbase = sum(c.values())
for el, cnt in c.items():
print(el, '{0:2.2f}%'.format((100.0* cnt)/sbase))
输出:
(u'NOUN', u'28.57%')
(u'VERB', u'14.29%')
(u'DET', u'28.57%')
(u'ADP', u'14.29%')
(u'PUNCT', u'14.29%')