为什么我在使用 `nltk.ne_chunk` 分块后没有得到 'PERSON' 和 'GPE' 作为标签?
Why I am not getting 'PERSON' nad 'GPE' as label after chunking using `nltk.ne_chunk`?
我正在使用 nltk.ne_chunk()
这样的:
sent="Azhar is asking what is weather in Chicago today? "
chunks = nltk.ne_chunk(nltk.pos_tag(nltk.word_tokenize(sent)), binary=True)
print(list(chunks))
并得到这样的输出:
[Tree('NE', [('Azhar', 'NNP')]), ('is', 'VBZ'), ('asking', 'VBG'), ('what', 'WP'), ('is',
'VBZ'), ('weather', 'NN'), ('in', 'IN'), Tree('NE', [('Chicago', 'NNP')]), ('today', 'NN'),
('?', '.')]
但我期待这样的输出:
[Tree('PERSON', [('Azhar', 'NNP')]), ('is', 'VBZ'), ('asking', 'VBG'), ('what', 'WP'), ('is',
'VBZ'), ('weather', 'NN'), ('in', 'IN'), Tree('GPE', [('Chicago', 'NNP')]), ('today', 'NN'),
('?', '.')]
有人能告诉我我做错了什么吗?
安装好Spacy库并下载相关模型(en_core_web_sm
)后here,就可以直接提取Named-Entities了!
import spacy
NER = spacy.load("en_core_web_sm")
sent="Azhar is asking what is weather in Chicago today? "
text1= NER(sent)
for word in text1.ents:
print(word.text,word.label_)
输出:
Azhar PERSON
Chicago GPE
today DATE
更新
nltk.ne_chunk
returns 嵌套的 nltk.tree.Tree
对象,因此您必须遍历 Tree 对象才能到达 NE。来自 nltk.chunk
的 tree2conlltags
会做那样的事情!
from nltk import word_tokenize, pos_tag, ne_chunk
from nltk.chunk import tree2conlltags
sentence = "Azhar is asking what is weather in Chicago today?"
print(tree2conlltags(ne_chunk(pos_tag(word_tokenize(sentence)))))
IOB格式输出:
[('Azhar', 'NNP', 'B-GPE'), ('is', 'VBZ', 'O'), ('asking', 'VBG', 'O'), ('what', 'WP', 'O'), ('is', 'VBZ', 'O'), ('weather', 'NN', 'O'), ('in', 'IN', 'O'), ('Chicago', 'NNP', 'B-GPE'), ('today', 'NN', 'O'), ('?', '.', 'O')]
关于此的更多信息 here!
我正在使用 nltk.ne_chunk()
这样的:
sent="Azhar is asking what is weather in Chicago today? "
chunks = nltk.ne_chunk(nltk.pos_tag(nltk.word_tokenize(sent)), binary=True)
print(list(chunks))
并得到这样的输出:
[Tree('NE', [('Azhar', 'NNP')]), ('is', 'VBZ'), ('asking', 'VBG'), ('what', 'WP'), ('is',
'VBZ'), ('weather', 'NN'), ('in', 'IN'), Tree('NE', [('Chicago', 'NNP')]), ('today', 'NN'),
('?', '.')]
但我期待这样的输出:
[Tree('PERSON', [('Azhar', 'NNP')]), ('is', 'VBZ'), ('asking', 'VBG'), ('what', 'WP'), ('is',
'VBZ'), ('weather', 'NN'), ('in', 'IN'), Tree('GPE', [('Chicago', 'NNP')]), ('today', 'NN'),
('?', '.')]
有人能告诉我我做错了什么吗?
安装好Spacy库并下载相关模型(en_core_web_sm
)后here,就可以直接提取Named-Entities了!
import spacy
NER = spacy.load("en_core_web_sm")
sent="Azhar is asking what is weather in Chicago today? "
text1= NER(sent)
for word in text1.ents:
print(word.text,word.label_)
输出:
Azhar PERSON
Chicago GPE
today DATE
更新
nltk.ne_chunk
returns 嵌套的 nltk.tree.Tree
对象,因此您必须遍历 Tree 对象才能到达 NE。来自 nltk.chunk
的 tree2conlltags
会做那样的事情!
from nltk import word_tokenize, pos_tag, ne_chunk
from nltk.chunk import tree2conlltags
sentence = "Azhar is asking what is weather in Chicago today?"
print(tree2conlltags(ne_chunk(pos_tag(word_tokenize(sentence)))))
IOB格式输出:
[('Azhar', 'NNP', 'B-GPE'), ('is', 'VBZ', 'O'), ('asking', 'VBG', 'O'), ('what', 'WP', 'O'), ('is', 'VBZ', 'O'), ('weather', 'NN', 'O'), ('in', 'IN', 'O'), ('Chicago', 'NNP', 'B-GPE'), ('today', 'NN', 'O'), ('?', '.', 'O')]
关于此的更多信息 here!