如何使用 NLTK 仅打印分块的字符串结果？

Question

我正在使用 NLTK 和 RegEx 来分析我的文本。该模型正确识别了我定义的块，但最后，所有标记词和 "My_Chunk" 都显示在打印结果中。问题是如何只打印文本的分块部分 ("My_Chunk")?

这是我的代码示例：

import re
import nltk

text = ['The absolutely kind professor asked students out whom he met in class']

for item in text:
    tokenized = nltk.word_tokenize(item)
    tagged = nltk.pos_tag(tokenized)

    chunk = r"""My_Chunk: {<RB.?>*<NN.?>*<VBD.?>}"""
    chunkParser = nltk.RegexpParser(chunk)

    chunked = chunkParser.parse(tagged)
    print(chunked)
    chunked.draw()

打印结果为：

(S
  The/DT
  (My_Chunk absolutely/RB kind/NN professor/NN asked/VBD)
  students/NNS
  out/RP
  whom/WP
  he/PRP
  (Chunk met/VBD)
  in/IN
  class/NN)

Answer 1

应该这样做：

for a in chunked:
    if isinstance(a, nltk.tree.Tree):
        if a.label() == "My_Chunk":
            print(a)
            print(" ".join([lf[0] for lf in a.leaves()]))
            print()

#(My_Chunk absolutely/RB kind/NN professor/NN asked/VBD)
#absolutely kind professor asked

#(My_Chunk met/VBD)
#met

如何使用 NLTK 仅打印分块的字符串结果？

How print only the string result of the chunking with NLTK?

python

regex

chunking

nltk