Spacy 说依赖解析器未加载

Question

我在 Ubuntu 16.04 上安装了 spaCy v2.0.2。然后我使用

sudo python3 -m spacy download en

下载英文模型。

之后我使用Spacy如下：

from spacy.lang.en import English

p = English(parser=True, tagger=True, entity=True)
d = p("This is a sentence. I am who I am.")
print(list(d.sents))

但是我收到这个错误：

File "doc.pyx", line 511, in __get__
ValueError: Sentence boundary detection requires the dependency parse, which requires a statistical model to be installed and loaded. For more info, see the documentation: 
https://spacy.io/usage/models

我实在想不通这是怎么回事。我安装了这个版本的 'en' 模型：

https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz

我认为这是默认值。任何帮助表示赞赏。谢谢。

Answer 1

我认为这里的问题很简单——当你这样调用时：

p = English(parser=True, tagger=True, entity=True)

...spaCy 将加载 English 语言 class 包含语言数据和特殊情况规则，但没有模型数据和权重，这使解析器、标记器和实体识别器能够作出预测。这是设计使然，因为 spaCy 无法知道如果你想加载模型数据，如果是， 哪个包 .

因此，如果您想加载英文模型，则必须使用 spacy.load()，它将负责加载数据，并将语言和处理管道放在一起：

nlp = spacy.load('en_core_web_sm')  # name of model, shortcut name or path

在后台，spacy.load() 将查找名为 en_core_web_sm 的已安装模型包，加载它并检查模型的元数据以确定模型需要哪种语言（在本例中，English) 以及它支持的管道（在本例中为标记器、解析器和 NER）。然后它初始化 English 的一个实例，创建管道，从模型包和 returns 对象中加载二进制数据，以便您可以在文本上调用它。 See this section 以获得更详细的解释。

Spacy 说依赖解析器未加载

Spacy says dependency parser not loaded

spacy