解析名词短语列表中的 NLTK 树输出

Parse NLTK tree output in a list of noun phrase

我有一句话

text  = '''If you're in construction or need to pass fire inspection, or just want fire resistant materials for peace of mind, this is the one to use. Check out 3rd party sellers as well Skylite'''

我在其上应用了 NLTK 分块并获得了一棵树作为输出。

sentences = nltk.sent_tokenize(d)
sentences = [nltk.word_tokenize(sent) for sent in sentences]
sentences = [nltk.pos_tag(sent) for sent in sentences]

grammar = """NP: {<DT>?<JJ>*<NN.*>+}
       RELATION: {<V.*>}
                 {<DT>?<JJ>*<NN.*>+}
       ENTITY: {<NN.*>}"""

cp = nltk.RegexpParser(grammar)
for i in sentences:
    result = cp.parse(i)
    print(result)
    print(type(result))
    result.draw() 

输出如下:

(S If/IN you/PRP (RELATION 're/VBP) in/IN (NP construction/NN) or/CC (NP need/NN) to/TO (RELATION pass/VB) (NP fire/NN inspection/NN) ,/, or/CC just/RB (RELATION want/VB) (NP fire/NN) (NP resistant/JJ materials/NNS) for/IN (NP peace/NN) of/IN (NP mind/NN) ,/, this/DT (RELATION is/VBZ) (NP the/DT one/NN) to/TO (RELATION use/VB) ./.)

如何获取字符串列表格式的名词短语:

[construction, need, fire inspection, fire, resistant materials, peace, mind, the one]

有什么建议吗……?

像这样:

noun_phrases_list = [[' '.join(leaf[0] for leaf in tree.leaves()) 
                      for tree in cp.parse(sent).subtrees() 
                      if tree.label()=='NP'] 
                      for sent in sentences]
#[['construction', 'need', 'fire inspection', 'fire', 'resistant materials', 
#  'peace', 'mind', 'the one'], 
# ['party sellers', 'Skylite']]

可以像下面这样对子树使用过滤器

grammar = "NP: {<DT>?<JJ>*<NN>}"
cp = nltk.RegexpParser(grammar)
result = cp.parse(sentences[1])
result.subtrees(filter =lambda t: t.label() == 'NP') # gives you generator