如何使用 NLTK 仅打印分块的字符串结果?
How print only the string result of the chunking with NLTK?
我正在使用 NLTK 和 RegEx 来分析我的文本。该模型正确识别了我定义的块,但最后,所有标记词和 "My_Chunk" 都显示在打印结果中。问题是如何只打印文本的分块部分 ("My_Chunk")?
这是我的代码示例:
import re
import nltk
text = ['The absolutely kind professor asked students out whom he met in class']
for item in text:
tokenized = nltk.word_tokenize(item)
tagged = nltk.pos_tag(tokenized)
chunk = r"""My_Chunk: {<RB.?>*<NN.?>*<VBD.?>}"""
chunkParser = nltk.RegexpParser(chunk)
chunked = chunkParser.parse(tagged)
print(chunked)
chunked.draw()
打印结果为:
(S
The/DT
(My_Chunk absolutely/RB kind/NN professor/NN asked/VBD)
students/NNS
out/RP
whom/WP
he/PRP
(Chunk met/VBD)
in/IN
class/NN)
应该这样做:
for a in chunked:
if isinstance(a, nltk.tree.Tree):
if a.label() == "My_Chunk":
print(a)
print(" ".join([lf[0] for lf in a.leaves()]))
print()
#(My_Chunk absolutely/RB kind/NN professor/NN asked/VBD)
#absolutely kind professor asked
#(My_Chunk met/VBD)
#met
我正在使用 NLTK 和 RegEx 来分析我的文本。该模型正确识别了我定义的块,但最后,所有标记词和 "My_Chunk" 都显示在打印结果中。问题是如何只打印文本的分块部分 ("My_Chunk")?
这是我的代码示例:
import re
import nltk
text = ['The absolutely kind professor asked students out whom he met in class']
for item in text:
tokenized = nltk.word_tokenize(item)
tagged = nltk.pos_tag(tokenized)
chunk = r"""My_Chunk: {<RB.?>*<NN.?>*<VBD.?>}"""
chunkParser = nltk.RegexpParser(chunk)
chunked = chunkParser.parse(tagged)
print(chunked)
chunked.draw()
打印结果为:
(S
The/DT
(My_Chunk absolutely/RB kind/NN professor/NN asked/VBD)
students/NNS
out/RP
whom/WP
he/PRP
(Chunk met/VBD)
in/IN
class/NN)
应该这样做:
for a in chunked:
if isinstance(a, nltk.tree.Tree):
if a.label() == "My_Chunk":
print(a)
print(" ".join([lf[0] for lf in a.leaves()]))
print()
#(My_Chunk absolutely/RB kind/NN professor/NN asked/VBD)
#absolutely kind professor asked
#(My_Chunk met/VBD)
#met