用 spacy 找到 noun_chunk 的根的 POS
finding the POS of the root of a noun_chunk with spacy
使用 spacy 时,您可以轻松地循环遍历文本的 noun_phrases,如下所示:
S='This is an example sentence that should include several parts and also make clear that studying Natural language Processing is not difficult'
nlp = spacy.load('en_core_web_sm')
doc = nlp(S)
[chunk.text for chunk in doc.noun_chunks]
# = ['an example sentence', 'several parts', 'Natural language Processing']
还可以得到名词块的"root":
[chunk.root.text for chunk in doc.noun_chunks]
# = ['sentence', 'parts', 'Processing']
如何获得每个单词的词性(即使看起来 noun_phrase 的词根始终是名词),以及如何获得引理、形状和单词该特定单词的单数。
这可能吗?
谢谢。
每个 chunk.root
都是一个 Token,您可以在其中获得不同的属性,包括 lemma_
和 pos_
(或者 tag_
,如果您更喜欢 PennTreekbak POS 标签).
import spacy
S='This is an example sentence that should include several parts and also make ' \
'clear that studying Natural language Processing is not difficult'
nlp = spacy.load('en_core_web_sm')
doc = nlp(S)
for chunk in doc.noun_chunks:
print('%-12s %-6s %s' % (chunk.root.text, chunk.root.pos_, chunk.root.lemma_))
sentence NOUN sentence
parts NOUN part
Processing NOUN processing
顺便说一句...在这句话中 "processing" 是一个名词,所以它的引理是 "processing",而不是 "process" 这是动词 "processing" 的引理.
使用 spacy 时,您可以轻松地循环遍历文本的 noun_phrases,如下所示:
S='This is an example sentence that should include several parts and also make clear that studying Natural language Processing is not difficult'
nlp = spacy.load('en_core_web_sm')
doc = nlp(S)
[chunk.text for chunk in doc.noun_chunks]
# = ['an example sentence', 'several parts', 'Natural language Processing']
还可以得到名词块的"root":
[chunk.root.text for chunk in doc.noun_chunks]
# = ['sentence', 'parts', 'Processing']
如何获得每个单词的词性(即使看起来 noun_phrase 的词根始终是名词),以及如何获得引理、形状和单词该特定单词的单数。
这可能吗?
谢谢。
每个 chunk.root
都是一个 Token,您可以在其中获得不同的属性,包括 lemma_
和 pos_
(或者 tag_
,如果您更喜欢 PennTreekbak POS 标签).
import spacy
S='This is an example sentence that should include several parts and also make ' \
'clear that studying Natural language Processing is not difficult'
nlp = spacy.load('en_core_web_sm')
doc = nlp(S)
for chunk in doc.noun_chunks:
print('%-12s %-6s %s' % (chunk.root.text, chunk.root.pos_, chunk.root.lemma_))
sentence NOUN sentence
parts NOUN part
Processing NOUN processing
顺便说一句...在这句话中 "processing" 是一个名词,所以它的引理是 "processing",而不是 "process" 这是动词 "processing" 的引理.