StanfordNLP、CoreNLP、spaCy——不同的依赖图

StanfordNLP, CoreNLP, spaCy - different dependency graphs

我正在尝试使用在依赖图上定义的简单 rules/patterns 从句子中提取非常基本的信息(例如,主语->谓语->宾语等三元组)。我开始使用 StanfordNLP since it was easy to set up and utlizes the GPU for better performance. However, I've noticed that for some sentences, the resulting dependency graph looked not as I would have expected -- I'm no expert though. I therefore tried two other solutions: spaCy and Stanford CoreNLP(我知道这些是由不同的组维护的?)

对于例句 "Tom made Sam believe that Alice has cancer." 我已经打印了所有三种方法的依赖关系。 CoreNLP 和 spaCy 产生相同的依赖关系,但它们与 StanfordNLP 的依赖关系不同。因此,我倾向于切换到 CoreNLP 和 spaCy(另一个优点是它们开箱即用 NER)。

有没有人有更多经验或反馈可以帮助从这里走向何方?我不希望 CoreNLP 和 spaCy 总是在相同的依赖图中产生,但在示例句子中,将 Sam 视为 obj 作为 StandfordNLP 所做的与 nsubj 相比(CoreNLP , spaCy) 似乎有显着差异

Format:
token   dependency_tag   parent_token

StanfordNLP
Tom     nsubj   made
made    ROOT    ROOT
Sam     obj     made
believe ccomp   made
that    mark    has
Alice   nsubj   has
has     ccomp   believe
cancer  obj     has
.       punct   made

CoreNLP
Tom     nsubj   made
made    ROOT    ROOT
Sam     nsubj   believe
believe ccomp   made
that    mark    has
Alice   nsubj   has
has     ccomp   believe
cancer  dobj    has
.       punct   made

spaCy
Tom     nsubj   made
made    ROOT    ROOT
Sam     nsubj   believe
believe ccomp   made
that    mark    has
Alice   nsubj   has
has     ccomp   believe
cancer  dobj    has
.       punct   made

不确定如何解决您的问题,但我建议您仔细阅读 Stanford CoreNLP 的文档:https://nlp.stanford.edu/software/lex-parser.shtml

在这个包中,有几个语法分析器和依赖分析器可供您使用。只看语法分析,有一个检索 k-best 分析的选项,如果你处理它们的依赖关系,你很可能会得到每个不同的依赖关系。

这与解析器的不准确性和自然语言的歧义有关。