德语单词的 spaCy 向量的差异文档和实现?
Discrepancy documentation and implementation of spaCy vectors for German words?
spaCy's small models (all packages that end in sm) don't ship with
word vectors, and only include context-sensitive tensors. [...]
individual tokens won't have any vectors assigned.
但是当我使用 de_core_news_sm
模型时,令牌确实有 x.vector
和 x.has_vector=True
的条目。
看起来这些是 context_vectors,但据我了解文档,只有词向量可以通过 vector
属性访问,sm
模型应该有 none.为什么这适用于 "small model"?
has_vector
行为与您预期的不同。
在 github 上提出的 issue 的评论中对此进行了讨论。要点是,由于向量可用,所以它是 True
,即使这些向量是上下文向量。请注意,您仍然可以使用它们,例如计算相似度。
引自 spaCy 贡献者Ines:
We've been going back and forth on how the has_vector should behave in
cases like this. There is a vector, so having it return False would be
misleading. Similarly, if the model doesn't come with a pre-trained
vocab, technically all lexemes are OOV.
2.1.0 版已宣布包含德语词向量。
spaCy's small models (all packages that end in sm) don't ship with word vectors, and only include context-sensitive tensors. [...] individual tokens won't have any vectors assigned.
但是当我使用 de_core_news_sm
模型时,令牌确实有 x.vector
和 x.has_vector=True
的条目。
看起来这些是 context_vectors,但据我了解文档,只有词向量可以通过 vector
属性访问,sm
模型应该有 none.为什么这适用于 "small model"?
has_vector
行为与您预期的不同。
在 github 上提出的 issue 的评论中对此进行了讨论。要点是,由于向量可用,所以它是 True
,即使这些向量是上下文向量。请注意,您仍然可以使用它们,例如计算相似度。
引自 spaCy 贡献者Ines:
We've been going back and forth on how the has_vector should behave in cases like this. There is a vector, so having it return False would be misleading. Similarly, if the model doesn't come with a pre-trained vocab, technically all lexemes are OOV.
2.1.0 版已宣布包含德语词向量。