Solr - 在文档中重复查询中的单词没有额外分数
Solr - No extra score for repeating words from query in document
我只想为术语匹配打分一次,而不是多次出现。
Ex - Search Query - Parle G Biscuits
Document 1 - Parle G Biscuits
Document 2 - Parle G Biscuits. I can eat 10 packets of Parle G Biscuits anytime.
Document 3 - Parle G Biscuits V2
I want to rank documents as Doc 1 > Doc 3 > Doc 2
Default answer from Solr - Doc 2 > Doc 1 > Doc 3
发生这种情况是因为该字符串在较长的字符串中被发现两次。如果我能以某种方式停止为两次出现打分,我会得到想要的结果,因为文档 2 和 3 会因字符串长度过大而受到轻微惩罚。
如何修改 Solr 以按给定的方式工作?
谢谢!
如果您不需要术语位置(例如,如果您不使用 foo:"word1 word2"
等短语进行搜索),您可以 set the field to drop any term frequency information, payloads and positions: omitTermFreqAndPositions="true"
.
If true, omits term frequency, positions, and payloads from postings for this field. This can be a performance boost for fields that don't require that information. It also reduces the storage space required for the index. Queries that rely on position that are issued on a field with this option will silently fail to find documents. This property defaults to true for all field types that are not text fields.
由于没有单独的设置来降低词频,如果您需要该设置禁用的其他两个功能,则必须实现自定义相似性。
我只想为术语匹配打分一次,而不是多次出现。
Ex - Search Query - Parle G Biscuits
Document 1 - Parle G Biscuits
Document 2 - Parle G Biscuits. I can eat 10 packets of Parle G Biscuits anytime.
Document 3 - Parle G Biscuits V2
I want to rank documents as Doc 1 > Doc 3 > Doc 2
Default answer from Solr - Doc 2 > Doc 1 > Doc 3
发生这种情况是因为该字符串在较长的字符串中被发现两次。如果我能以某种方式停止为两次出现打分,我会得到想要的结果,因为文档 2 和 3 会因字符串长度过大而受到轻微惩罚。
如何修改 Solr 以按给定的方式工作?
谢谢!
如果您不需要术语位置(例如,如果您不使用 foo:"word1 word2"
等短语进行搜索),您可以 set the field to drop any term frequency information, payloads and positions: omitTermFreqAndPositions="true"
.
If true, omits term frequency, positions, and payloads from postings for this field. This can be a performance boost for fields that don't require that information. It also reduces the storage space required for the index. Queries that rely on position that are issued on a field with this option will silently fail to find documents. This property defaults to true for all field types that are not text fields.
由于没有单独的设置来降低词频,如果您需要该设置禁用的其他两个功能,则必须实现自定义相似性。