如何使用 python nltk 获取解析树?

how to get parse tree using python nltk?

给定以下句子:

The old oak tree from India fell down.

如何使用 python NLTK 获得句子的以下解析树表示?

(ROOT (S (NP (NP (DT The) (JJ old) (NN oak) (NN tree)) (PP (IN from) (NP (NNP India)))) (VP (VBD fell) (PRT (RP down)))))

我需要一个在网上找不到的完整示例!


编辑

我已经完成了this book chapter to learn about parsing using NLTK but the problem is, I need a grammar to parse sentences or phrases which I do not have. I have found this Whosebug post,其中也询问了语法分析的问题,但那里没有令人信服的答案。

所以,我正在寻找一个完整的答案,它可以给我一个句子的解析树。

这是使用 StanfordCoreNLP 而不是 nltk 的替代解决方案。建立在 StanfordCoreNLP 之上的库很少,我个人使用 pycorenlp 来解析句子。

首先你必须下载 stanford-corenlp-full 文件夹,里面有 *.jar 文件。和运行文件夹内的服务器(默认端口为9000)。

export CLASSPATH="`find . -name '*.jar'`"
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer [port?] # run server

那么在Python中,可以运行下面的语句来标记

from pycorenlp import StanfordCoreNLP
nlp = StanfordCoreNLP('http://localhost:9000')

text = "The old oak tree from India fell down."

output = nlp.annotate(text, properties={
  'annotators': 'parse',
  'outputFormat': 'json'
})

print(output['sentences'][0]['parse']) # tagged output sentence

较早的问题,但您可以将 nltk 与 bllipparser. Here is a longer example from nltk 一起使用。经过一些摆弄后,我自己使用了以下内容:

安装(已安装 nltk):

sudo python3 -m nltk.downloader bllip_wsj_no_aux
pip3 install bllipparser

使用:

from nltk.data import find
from bllipparser import RerankingParser

model_dir = find('models/bllip_wsj_no_aux').path
parser = RerankingParser.from_unified_model_dir(model_dir)

best = parser.parse("The old oak tree from India fell down.")

print(best.get_reranker_best())
print(best.get_parser_best())

输出:

-80.435259246021 -23.831876011253 (S1 (S (NP (NP (DT The) (JJ old) (NN oak) (NN tree)) (PP (IN from) (NP (NNP India)))) (VP (VBD fell) (PRT (RP down))) (. .)))
-79.703612178593 -24.505514522222 (S1 (S (NP (NP (DT The) (JJ old) (NN oak) (NN tree)) (PP (IN from) (NP (NNP India)))) (VP (VBD fell) (ADVP (RB down))) (. .)))

要使用 nltk 库获取解析树,您可以使用以下代码

# Import required libraries
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
from nltk import pos_tag, word_tokenize, RegexpParser

# Example text
sample_text = "The quick brown fox jumps over the lazy dog"

# Find all parts of speech in above sentence
tagged = pos_tag(word_tokenize(sample_text))

#Extract all parts of speech from any text
chunker = RegexpParser("""
                    NP: {<DT>?<JJ>*<NN>} #To extract Noun Phrases
                    P: {<IN>}            #To extract Prepositions
                    V: {<V.*>}           #To extract Verbs
                    PP: {<p> <NP>}       #To extract Prepositional Phrases
                    VP: {<V> <NP|PP>*}   #To extract Verb Phrases
                    """)

# Print all parts of speech in above sentence
output = chunker.parse(tagged)
print("After Extracting\n", output)
# output looks something like this
 (S
  (NP The/DT old/JJ oak/NN)
  (NP tree/NN)
  (P from/IN)
  India/NNP
  (VP (V fell/VBD))
  down/RB
  ./.)

您还可以获得这棵树的图表

# To draw the parse tree
output.draw()

输出图如下所示