德语单词的 spaCy 向量的差异文档和实现?

Discrepancy documentation and implementation of spaCy vectors for German words?

根据documentation

spaCy's small models (all packages that end in sm) don't ship with word vectors, and only include context-sensitive tensors. [...] individual tokens won't have any vectors assigned.

但是当我使用 de_core_news_sm 模型时,令牌确实有 x.vectorx.has_vector=True 的条目。

看起来这些是 context_vectors,但据我了解文档,只有词向量可以通过 vector 属性访问,sm 模型应该有 none.为什么这适用于 "small model"?

has_vector 行为与您预期的不同。

在 github 上提出的 issue 的评论中对此进行了讨论。要点是,由于向量可用,所以它是 True,即使这些向量是上下文向量。请注意,您仍然可以使用它们,例如计算相似度。

引自 spaCy 贡献者Ines:

We've been going back and forth on how the has_vector should behave in cases like this. There is a vector, so having it return False would be misleading. Similarly, if the model doesn't come with a pre-trained vocab, technically all lexemes are OOV.

2.1.0 版已宣布包含德语词向量。