使用 NLTK 使用 MaltParser 解析多个句子

Question

已经有很多 MaltParser and/or NLTK 相关问题：

Malt Parser throwing class not found exception
How to use malt parser in python nltk
MaltParser Not Working in Python NLTK
NLTK MaltParser won't parse
Dependency parser using NLTK and MaltParser
Dependency Parsing using MaltParser and NLTK
Parsing with MaltParser engmalt
Parse raw text with MaltParser in Java

现在，NLTK 中有一个更稳定的 MaltParser API 版本：https://github.com/nltk/nltk/pull/944 但是在同时解析多个句子时存在问题。

一次解析一个句子似乎没问题：

_path_to_maltparser = '/home/alvas/maltparser-1.8/dist/maltparser-1.8/'
_path_to_model= '/home/alvas/engmalt.linear-1.7.mco'     
>>> mp = MaltParser(path_to_maltparser=_path_to_maltparser, model=_path_to_model)
>>> sent = 'I shot an elephant in my pajamas'.split()
>>> sent2 = 'Time flies like banana'.split()
>>> print(mp.parse_one(sent).tree())
(pajamas (shot I) an elephant in my)

但是解析句子列表不会 return DependencyGraph 对象：

_path_to_maltparser = '/home/alvas/maltparser-1.8/dist/maltparser-1.8/'
_path_to_model= '/home/alvas/engmalt.linear-1.7.mco'     
>>> mp = MaltParser(path_to_maltparser=_path_to_maltparser, model=_path_to_model)
>>> sent = 'I shot an elephant in my pajamas'.split()
>>> sent2 = 'Time flies like banana'.split()
>>> print(mp.parse_one(sent).tree())
(pajamas (shot I) an elephant in my)
>>> print(next(mp.parse_sents([sent,sent2])))
<listiterator object at 0x7f0a2e4d3d90> 
>>> print(next(next(mp.parse_sents([sent,sent2]))))
[{u'address': 0,
  u'ctag': u'TOP',
  u'deps': [2],
  u'feats': None,
  u'lemma': None,
  u'rel': u'TOP',
  u'tag': u'TOP',
  u'word': None},
 {u'address': 1,
  u'ctag': u'NN',
  u'deps': [],
  u'feats': u'_',
  u'head': 2,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'NN',
  u'word': u'I'},
 {u'address': 2,
  u'ctag': u'NN',
  u'deps': [1, 11],
  u'feats': u'_',
  u'head': 0,
  u'lemma': u'_',
  u'rel': u'null',
  u'tag': u'NN',
  u'word': u'shot'},
 {u'address': 3,
  u'ctag': u'AT',
  u'deps': [],
  u'feats': u'_',
  u'head': 11,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'AT',
  u'word': u'an'},
 {u'address': 4,
  u'ctag': u'NN',
  u'deps': [],
  u'feats': u'_',
  u'head': 11,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'NN',
  u'word': u'elephant'},
 {u'address': 5,
  u'ctag': u'NN',
  u'deps': [],
  u'feats': u'_',
  u'head': 11,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'NN',
  u'word': u'in'},
 {u'address': 6,
  u'ctag': u'NN',
  u'deps': [],
  u'feats': u'_',
  u'head': 11,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'NN',
  u'word': u'my'},
 {u'address': 7,
  u'ctag': u'NNS',
  u'deps': [],
  u'feats': u'_',
  u'head': 11,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'NNS',
  u'word': u'pajamas'},
 {u'address': 8,
  u'ctag': u'NN',
  u'deps': [],
  u'feats': u'_',
  u'head': 11,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'NN',
  u'word': u'Time'},
 {u'address': 9,
  u'ctag': u'NNS',
  u'deps': [],
  u'feats': u'_',
  u'head': 11,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'NNS',
  u'word': u'flies'},
 {u'address': 10,
  u'ctag': u'NN',
  u'deps': [],
  u'feats': u'_',
  u'head': 11,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'NN',
  u'word': u'like'},
 {u'address': 11,
  u'ctag': u'NN',
  u'deps': [3, 4, 5, 6, 7, 8, 9, 10],
  u'feats': u'_',
  u'head': 2,
  u'lemma': u'_',
  u'rel': u'dep',
  u'tag': u'NN',
  u'word': u'banana'}]

为什么使用 parse_sents() 不是 return parse_one 的可迭代对象？

不过我可以，只是偷懒去做：

_path_to_maltparser = '/home/alvas/maltparser-1.8/dist/maltparser-1.8/'
_path_to_model= '/home/alvas/engmalt.linear-1.7.mco'     
>>> mp = MaltParser(path_to_maltparser=_path_to_maltparser, model=_path_to_model)
>>> sent1 = 'I shot an elephant in my pajamas'.split()
>>> sent2 = 'Time flies like banana'.split()
>>> sentences = [sent1, sent2]
>>> for sent in sentences:
>>> ...    print(mp.parse_one(sent).tree())

但这不是我正在寻找的解决方案。 我的问题是如何回答为什么 parse_sent() return 不是 parse_one() 的可迭代对象。以及如何在 NLTK 代码中修复它？

@NikitaAstrakhantsev 回答后，我试过它现在输出一个解析树，但它似乎很困惑，在解析之前将两个句子合二为一。

# Initialize a MaltParser object with a pre-trained model.
mp = MaltParser(path_to_maltparser=path_to_maltparser, model=path_to_model) 
sent = 'I shot an elephant in my pajamas'.split()
sent2 = 'Time flies like banana'.split()
# Parse a single sentence.
print(mp.parse_one(sent).tree())
print(next(next(mp.parse_sents([sent,sent2]))).tree())

[输出]:

(pajamas (shot I) an elephant in my)
(shot I (banana an elephant in my pajamas Time flies like))

从代码来看它似乎做了一些奇怪的事情：https://github.com/nltk/nltk/blob/develop/nltk/parse/api.py#L45

为什么NLTK中的parser abstract class在解析前要把两个句子合二为一？我是否错误地调用了 parse_sents()？如果是这样，调用 parse_sents() 的正确方法是什么？

Answer 1

正如我在您的代码示例中看到的那样，您没有在这一行中调用 tree()

>>> print(next(next(mp.parse_sents([sent,sent2]))))

而您在所有情况下都使用 parse_one() 调用 tree()。

否则我看不出它可能发生的原因：ParserI 的 parse_one() 方法在 MaltParser 中没有被覆盖，它所做的一切只是调用 parse_sents() 的 MaltParser，请参阅 the code。

更新： The line you're talking about没有被调用，因为parse_sents()在MaltParser中被覆盖，直接被调用。

我现在唯一的猜测是 java lib maltparser 不能正确处理包含几个句子的输入文件（我的意思是 this block - 其中 java 是运行).也许原来的 malt 解析器改变了格式，现在不是 '\n\n'。不幸的是，我无法运行自己编写此代码，因为 maltparser.org 已在第二天停机。我检查了输入文件是否具有预期格式（句子由双端线分隔），因此 python wrapper 合并句子的可能性很小。

使用 NLTK 使用 MaltParser 解析多个句子

Parsing multiple sentences with MaltParser using NLTK

python

java

parsing

nlp

nltk