ne_chunk 在 NLTK 中没有 pos_tag
ne_chunk without pos_tag in NLTK
我正在尝试使用 nltk 中的 ne_chunk 和 pos_tag 对句子进行分块。
from nltk import tag
from nltk.tag import pos_tag
from nltk.tree import Tree
from nltk.chunk import ne_chunk
sentence = "Michael and John is reading a booklet in a library of Jakarta"
tagged_sent = pos_tag(sentence.split())
print_chunk = [chunk for chunk in ne_chunk(tagged_sent) if isinstance(chunk, Tree)]
print print_chunk
这是结果:
[Tree('GPE', [('Michael', 'NNP')]), Tree('PERSON', [('John', 'NNP')]), Tree('GPE', [('Jakarta', 'NNP')])]
我的问题,是否可以不包括 pos_tag(如上面的 NNP)而只包括树 'GPE'、'PERSON'?
'GPE' 是什么意思?
提前致谢
命名实体分块器将为您提供包含分块和标签的树。你不能改变它,但你可以把标签去掉。从您的 tagged_sent
开始:
chunks = nltk.ne_chunk(tagged_sent)
simple = []
for elt in chunks:
if isinstance(elt, Tree):
simple.append(Tree(elt.label(), [ word for word, tag in elt ]))
else:
simple.append( elt[0] )
如果您只想要块,请省略上面的 else:
子句。您可以调整代码以按照您想要的任何方式包装块。我使用 nltk Tree
将更改保持在最低限度。请注意,某些块由多个单词组成(尝试在您的示例中添加 "New York"),因此块的内容必须是一个列表,而不是单个元素。
PS。 "GPE" 代表 "geo-political entity" (显然是词块划分错误)。你可以在 nltk 书中看到 "commonly used tags" 的列表,here。
最有可能对 上带有标签的代码稍作修改就是您所需要的。
is it possible not to include pos_tag (like NNP above) and only include Tree 'GPE','PERSON'?
对,简单遍历Tree对象=)见
>>> from nltk import Tree, pos_tag, ne_chunk
>>> sentence = "Michael and John is reading a booklet in a library of Jakarta"
>>> tagged_sent = ne_chunk(pos_tag(sentence.split()))
>>> tagged_sent
Tree('S', [Tree('GPE', [('Michael', 'NNP')]), ('and', 'CC'), Tree('PERSON', [('John', 'NNP')]), ('is', 'VBZ'), ('reading', 'VBG'), ('a', 'DT'), ('booklet', 'NN'), ('in', 'IN'), ('a', 'DT'), ('library', 'NN'), ('of', 'IN'), Tree('GPE', [('Jakarta', 'NNP')])])
>>> from nltk.sem.relextract import NE_CLASSES
>>> ace_tags = NE_CLASSES['ace']
>>> for node in tagged_sent:
... if type(node) == Tree and node.label() in ace_tags:
... words, tags = zip(*node.leaves())
... print node.label() + '\t' + ' '.join(words)
...
GPE Michael
PERSON John
GPE Jakarta
What 'GPE' means?
GPE 表示 "Geo-Political Entity"
GPE
标签来自 ACE dataset
有两个可用的预训练 NE 词块划分器,参见 https://github.com/nltk/nltk/blob/develop/nltk/chunk/init.py#L164
支持 3 个标签集:https://github.com/nltk/nltk/blob/develop/nltk/sem/relextract.py#L31
详细解释见
我正在尝试使用 nltk 中的 ne_chunk 和 pos_tag 对句子进行分块。
from nltk import tag
from nltk.tag import pos_tag
from nltk.tree import Tree
from nltk.chunk import ne_chunk
sentence = "Michael and John is reading a booklet in a library of Jakarta"
tagged_sent = pos_tag(sentence.split())
print_chunk = [chunk for chunk in ne_chunk(tagged_sent) if isinstance(chunk, Tree)]
print print_chunk
这是结果:
[Tree('GPE', [('Michael', 'NNP')]), Tree('PERSON', [('John', 'NNP')]), Tree('GPE', [('Jakarta', 'NNP')])]
我的问题,是否可以不包括 pos_tag(如上面的 NNP)而只包括树 'GPE'、'PERSON'? 'GPE' 是什么意思?
提前致谢
命名实体分块器将为您提供包含分块和标签的树。你不能改变它,但你可以把标签去掉。从您的 tagged_sent
开始:
chunks = nltk.ne_chunk(tagged_sent)
simple = []
for elt in chunks:
if isinstance(elt, Tree):
simple.append(Tree(elt.label(), [ word for word, tag in elt ]))
else:
simple.append( elt[0] )
如果您只想要块,请省略上面的 else:
子句。您可以调整代码以按照您想要的任何方式包装块。我使用 nltk Tree
将更改保持在最低限度。请注意,某些块由多个单词组成(尝试在您的示例中添加 "New York"),因此块的内容必须是一个列表,而不是单个元素。
PS。 "GPE" 代表 "geo-political entity" (显然是词块划分错误)。你可以在 nltk 书中看到 "commonly used tags" 的列表,here。
最有可能对
is it possible not to include pos_tag (like NNP above) and only include Tree 'GPE','PERSON'?
对,简单遍历Tree对象=)见
>>> from nltk import Tree, pos_tag, ne_chunk
>>> sentence = "Michael and John is reading a booklet in a library of Jakarta"
>>> tagged_sent = ne_chunk(pos_tag(sentence.split()))
>>> tagged_sent
Tree('S', [Tree('GPE', [('Michael', 'NNP')]), ('and', 'CC'), Tree('PERSON', [('John', 'NNP')]), ('is', 'VBZ'), ('reading', 'VBG'), ('a', 'DT'), ('booklet', 'NN'), ('in', 'IN'), ('a', 'DT'), ('library', 'NN'), ('of', 'IN'), Tree('GPE', [('Jakarta', 'NNP')])])
>>> from nltk.sem.relextract import NE_CLASSES
>>> ace_tags = NE_CLASSES['ace']
>>> for node in tagged_sent:
... if type(node) == Tree and node.label() in ace_tags:
... words, tags = zip(*node.leaves())
... print node.label() + '\t' + ' '.join(words)
...
GPE Michael
PERSON John
GPE Jakarta
What 'GPE' means?
GPE 表示 "Geo-Political Entity"
GPE
标签来自 ACE dataset有两个可用的预训练 NE 词块划分器,参见 https://github.com/nltk/nltk/blob/develop/nltk/chunk/init.py#L164
支持 3 个标签集:https://github.com/nltk/nltk/blob/develop/nltk/sem/relextract.py#L31
详细解释见