使用 SpaCy 时下载 'models' 有什么意义？

Question

模型是做什么的？

我看到这个术语在 NLP 和 ML 中普遍存在，似乎没有一个具体的定义。

模型在 NLP 和 SpaCy 方面完成了什么？

import spacy
from spacy import displacy

nlp = spacy.load('en_core_web_sm')
doc = nlp(u'This is a sentence.')
displacy.serve(doc, style='dep', options={'compact': True})

Answer 1

不确定具体的 SpaCy，但在 ML 中通常指的是学习算法的（数学？）模型。来自 Wikipedia:

[Machine learning] Evolved from the study of pattern recognition and 
computational learning theory in artificial intelligence, machine learning 
explores the study and construction of algorithms that can learn from and make 
predictions on data – such algorithms overcome following strictly static 
program instructions by making data-driven predictions or decisions, through 
building a ***model*** from sample inputs.

例如，在神经网络中，模型将由权重和激活函数以及它们组合在一起的方式组成。模型就是您训练的对象，经过训练后，模型会做出您的预测。这是您的 ML 程序，或者至少是其中的 ML 部分，如果您愿意的话。

Answer 2

spaCy 101 guide 有一节介绍此内容 – 参见此处：

While some of spaCy's features work independently, others require statistical models to be loaded, which enable spaCy to predict linguistic annotations – for example, whether a word is a verb or a noun. spaCy currently offers statistical models for 8 languages, which can be installed as individual Python modules. Models can differ in size, speed, memory usage, accuracy and the data they include. The model you choose always depends on your use case and the texts you're working with. For a general-purpose use case, the small, default models are always a good start. They typically include the following components:

Binary weights for the part-of-speech tagger, dependency parser and named entity recognizer to predict those annotations in context.

Lexical entries in the vocabulary, i.e. words and their context-independent attributes like the shape or spelling.

Word vectors, i.e. multi-dimensional meaning representations of words that let you determine how similar they are to each other.

Configuration options, like the language and processing pipeline settings, to put spaCy in the correct state when you load in the model.

Answer 3

Spacy 并非完全独立配备所有模型。

虽然它独立处理大部分功能，但是对于需要注意预测的用例来说是的 linguistic annotations，其他统计模型需要作为单独的模块加载。

我想因为额外的模块只适用于特定的用例，也许这就是为什么 spacy 团队选择不跟上辅助 spaCy 模块的原因。

使用 SpaCy 时下载 'models' 有什么意义？

What's the point of downloading 'models' when using SpaCy?

nlp

spacy