为什么 gensim FastText 模型的大小比 Facebook 的原生 Fasttext 模型小？

Why is gensim FastText model smaller in size than the native Fasttext model by Facebook?

似乎是 Gensim's implementation in FastText leads to a smaller model size than Facebook's 本机实现。 100万词的语料，fasttext原生模型是6GB，而gensim fasttext模型大小只有68MB

是否有存储在 Facebook 实现中但在 Gensim 实现中不存在的信息？

请说明生成此比较的模型或使用的过程。它可能有 bugs/misunderstandings.

与 'corpus' 大小相比，模型的大小受唯一单词（和字符 n-gram 桶）数量的影响更大。

Gensim 训练的 FastText 模型或原生 Facebook FastText 训练模型的保存大小应该大致相同。请务必包括由 Gensim 的 .save() 创建的所有附属原始 numpy 文件（以 .npy 结尾，与主保存文件一起）- 因为所有此类文件都需要重新 .load()模特！

同样，如果您要将 Facebook FastText 模型加载到 Gensim 中，然后使用 Gensim 的 .save()，两种替代格式的总磁盘 space 应该非常接近。

为什么 gensim FastText 模型的大小比 Facebook 的原生 Fasttext 模型小？

Why is gensim FastText model smaller in size than the native Fasttext model by Facebook?

python

nlp

machine-learning

gensim

fasttext