Elasticsearch 中的规范、文档频率和建议

Question

如果我有一个名为 name 的字段并且我使用建议 api 来获取拼写错误的建议，我是否需要启用文档频率或规范才能提供准确的建议？我的假设是肯定的，但我很好奇 lucene 中是否有一个单独的建议索引可以处理频率 and/or 规范，即使我在主索引中为该字段禁用了它。

Answer 1

我怀疑 suggester 是否可以在没有字段长度规范化的情况下工作，因为禁用规范意味着您正在寻找一个二进制值，无论该术语是否存在于文档字段中，这反过来会影响相似性得分每个文档。

These three factors—term frequency, inverse document frequency, and field-length norm—are calculated and stored at index time. Together, they are used to calculate the weight of a single term in a particular document.

"but I am curious if maybe there is a separate suggestions index in lucene that handles frequency and/or norms even if I have it disabled for the field in my main index." 任何建议者都会默认使用 Vector Space 模型来计算余弦相似度，这又会使用在为每个术语建立索引时计算的基于 tf-idf-norm 的评分来对建议进行排名，所以我怀疑建议者是否可以评分无需字段规范即可准确记录。

Elasticsearch 中的规范、文档频率和建议

Norms, Document Frequency and Suggestions in Elasticsearch

elasticsearch