我可以通过同时推断所有文档来为我想推断的每个文档保留 doc2vec 模式的随机状态吗？

Question

有没有办法使用 Gensim Doc2Vec 同时推断多个文档以保持模型的随机状态？

函数infer_vector定义为

infer_vector(doc_words, alpha=None, min_alpha=None, epochs=None, steps=None)¶

where doc_words (list of str) – 将为其推断矢量表示的文档。而且我找不到任何其他选项来同时推断多个文档。

Answer 1

当前没有一次推断多个文档的选项。这是 infer_vector() 的众多愿望清单改进之一（收集在 open issue 中），但目前还没有进行中的工作或有针对性的发布。

我不确定 "preserve the random state of the model" 是什么意思。我看到的批处理的主要动机是用户方便，或通过多线程增加性能。

如果您真正想要的是确定性推理，请参阅 answer in the Gensim FAQ which explains why deterministic Doc2Vec inference isn't necessarily a good idea。（它还包括一个 link 问题，其中包含一些关于如何强制执行它的想法，如果你决心这样做，尽管有充分的理由不这样做。）

Can I preserve the random state of a doc2vec mode for each document I want to infer by infering all documents at the same time?