根据点击次数对搜索进行排名的功能得分属性不适用于弹性搜索和 rails
Function Score attribute to rank searches based on clicks not working with elastic search and rails
我在我的文档模型中实现了功能评分属性,它包含一个点击字段,用于跟踪每个文档的查看次数。现在我希望搜索结果获得更高的优先级并根据每次搜索的点击次数显示在顶部
我的document.rb代码
require 'elasticsearch/model'
def self.search(query)
__elasticsearch__.search(
{
query: {
function_score: {
query: {
multi_match: {
query: query,
fields: ['name', 'service'],
fuzziness: "AUTO"
}
},
field_value_factor: {
field: 'clicks',
modifier: 'log1p',
factor: 2
}
}
}
}
)
end
settings index: { "number_of_shards": 1,
analysis: {
analyzer: {
edge_ngram_analyzer: { type: "custom", tokenizer: "standard", filter:
["lowercase", "edge_ngram_filter", "stop", "kstem" ] },
}
},
filter: { ascii_folding: { type: 'asciifolding', preserve_original: true
},
edge_ngram_filter: { type: "edgeNGram", min_gram: "3", max_gram:
"20" }
}
} do
mapping do
indexes :name, type: "string", analyzer: "edge_ngram_analyzer",
term_vector: "with_positions"
indexes :service, type: "string", analyzer: "edge_ngram_analyzer",
term_vector: "with_positions"
end
end
end
这里是搜索视图
<h1>Document Search</h1>
<%= form_for search_path, method: :get do |f| %>
<p>
<%= f.label "Search for" %>
<%= text_field_tag :query, params[:query] %>
<%= submit_tag "Go", name: nil %>
</p>
<% end %>
<% if @documents %>
<ul class="search_results">
<% @documents.each do |document| %>
<li>
<h3>
<%= link_to document.name, controller: "documents", action: "show",
id: document._id %>
</h3>
</li>
<% end %>
</ul>
<% else %>
<p>Your search did not match any documents.</p>
<% end %>
<br/>
当我搜索 Estamp 时,我得到的结果按以下顺序排列:
Franking and Estamp # clicks 5
Notary and Estamp #clicks 8
很明显,当 Notary 和 Estamp 获得更多点击时,它不会到达顶部 search.How 我可以实现这个吗?
这是我在控制台上 运行 时得到的结果。
POST_search
"hits": {
"total": 2,
"max_score": 1.322861,
"hits": [
{
"_index": "documents",
"_type": "document",
"_id": "13",
"_score": 1.322861,
"_source": {
"id": 13,
"name": "Franking and Estamp",
"service": "Estamp",
"user_id": 1,
"clicks": 7
},
{
"_index": "documents",
"_type": "document",
"_id": "14",
"_score": 0.29015404,
"_source": {
"id": 14,
"name": "Notary and Estamp",
"service": "Notary",
"user_id": 1,
"clicks": 12
}
}
]
此处文档的分数未根据点击次数更新
在没有看到您的索引数据的情况下很难回答。但是看着查询,我想到了一件事,我将用简短的例子来展示它:
示例 1:
我已将以下文档编入索引:
{"name":"Franking and Estampy", "service" :"text", "clicks": 5}
{"name":"Notary and Estamp", "service" :"text", "clicks": 8}
运行 您提供的相同查询给出了这个结果:
"hits": {
"total": 2,
"max_score": 4.333119,
"hits": [
{
"_index": "script",
"_type": "test",
"_id": "AV2iwkems7jEvHyvnccV",
"_score": 4.333119,
"_source": {
"name": "Notary and Estamp",
"service": "text",
"clicks": 8
}
},
{
"_index": "script",
"_type": "test",
"_id": "AV2iwo6ds7jEvHyvnccW",
"_score": 3.6673431,
"_source": {
"name": "Franking and Estampy",
"service": "text",
"clicks": 5
}
}
]
}
所以一切都很好 - 点击 8 次的文档得分更高(_score
字段值)并且顺序正确。
示例 2:
我注意到在您的查询中 name
字段被提升为高因子。那么,如果我将以下数据编入索引会怎样?
{"name":"Franking and Estampy", "service" :"text", "clicks": 5}
{"name":"text", "service" :"Notary and Estamp", "clicks": 8}
结果:
"hits": {
"total": 2,
"max_score": 13.647502,
"hits": [
{
"_index": "script",
"_type": "test",
"_id": "AV2iwo6ds7jEvHyvnccW",
"_score": 13.647502,
"_source": {
"name": "Franking and Estampy",
"service": "text",
"clicks": 5
}
},
{
"_index": "script",
"_type": "test",
"_id": "AV2iwkems7jEvHyvnccV",
"_score": 1.5597181,
"_source": {
"name": "text",
"service": "Notary and Estamp",
"clicks": 8
}
}
]
}
虽然Franking and Estampy
只有5次点击,但比第二个点击次数多的文档得分高很多。
所以重点是,在您的查询中,点击次数并不是影响文档评分和最终顺序的唯一因素。没有真实数据,它是只是我的猜测。您可以 运行 使用一些 REST 客户端自己查询并检查 scoring/field/matching 个短语。
更新
根据您的搜索结果 - 您可以看到带有 id=13
的文档在两个字段(name
和 service
)中都有 Estamp
个术语。这就是该文档获得更高评分的原因(这意味着在计算评分的算法中,在两个字段中都有该术语比具有更高的点击次数更重要)。如果您希望 clicks
字段对得分产生更大的影响,请尝试使用 factor
(可能应该更高)和 modifier
("modifier": "square"
可能适用于您的情况) .您可以检查可能的值 here.
试试这个组合:
{
"query": {
"function_score": {
... // same as before
},
"field_value_factor": {
"field": "clicks" ,
"modifier": "square",
"factor": 3
}
}
}
}
更新 2 - 仅根据点击次数评分
如果唯一应该对评分有影响的参数应该是clicks
字段中的值,你可以尝试使用"boost_mode": "replace"
——在这种情况下只使用函数评分,查询分数被忽略。因此 Estamp
项在 name
和 service
字段中的出现频率不会影响评分。试试这个查询:
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "Estamp",
"fields": [ "name", "service"],
"fuzziness": "AUTO"
}
},
"field_value_factor": {
"field": "clicks",
"factor": 1
},
"boost_mode": "replace"
}
}
}
它给了我:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 5,
"hits": [
{
"_index": "script",
"_type": "test",
"_id": "AV2nI0HkJPYn0YKQxRvd",
"_score": 5,
"_source": {
"name": "Notary and Estamp",
"service": "Notary",
"clicks": 5
}
},
{
"_index": "script",
"_type": "test",
"_id": "AV2nIwKvJPYn0YKQxRvc",
"_score": 4,
"_source": {
"name": "Franking and Estamp",
"service": "Estamp",
"clicks": 4
}
}
]
}
}
这可能是您要找的那个(请注意值 "_score": 5
和 "_score": 4
与点击次数匹配)。
我在我的文档模型中实现了功能评分属性,它包含一个点击字段,用于跟踪每个文档的查看次数。现在我希望搜索结果获得更高的优先级并根据每次搜索的点击次数显示在顶部
我的document.rb代码
require 'elasticsearch/model'
def self.search(query)
__elasticsearch__.search(
{
query: {
function_score: {
query: {
multi_match: {
query: query,
fields: ['name', 'service'],
fuzziness: "AUTO"
}
},
field_value_factor: {
field: 'clicks',
modifier: 'log1p',
factor: 2
}
}
}
}
)
end
settings index: { "number_of_shards": 1,
analysis: {
analyzer: {
edge_ngram_analyzer: { type: "custom", tokenizer: "standard", filter:
["lowercase", "edge_ngram_filter", "stop", "kstem" ] },
}
},
filter: { ascii_folding: { type: 'asciifolding', preserve_original: true
},
edge_ngram_filter: { type: "edgeNGram", min_gram: "3", max_gram:
"20" }
}
} do
mapping do
indexes :name, type: "string", analyzer: "edge_ngram_analyzer",
term_vector: "with_positions"
indexes :service, type: "string", analyzer: "edge_ngram_analyzer",
term_vector: "with_positions"
end
end
end
这里是搜索视图
<h1>Document Search</h1>
<%= form_for search_path, method: :get do |f| %>
<p>
<%= f.label "Search for" %>
<%= text_field_tag :query, params[:query] %>
<%= submit_tag "Go", name: nil %>
</p>
<% end %>
<% if @documents %>
<ul class="search_results">
<% @documents.each do |document| %>
<li>
<h3>
<%= link_to document.name, controller: "documents", action: "show",
id: document._id %>
</h3>
</li>
<% end %>
</ul>
<% else %>
<p>Your search did not match any documents.</p>
<% end %>
<br/>
当我搜索 Estamp 时,我得到的结果按以下顺序排列:
Franking and Estamp # clicks 5
Notary and Estamp #clicks 8
很明显,当 Notary 和 Estamp 获得更多点击时,它不会到达顶部 search.How 我可以实现这个吗?
这是我在控制台上 运行 时得到的结果。
POST_search
"hits": {
"total": 2,
"max_score": 1.322861,
"hits": [
{
"_index": "documents",
"_type": "document",
"_id": "13",
"_score": 1.322861,
"_source": {
"id": 13,
"name": "Franking and Estamp",
"service": "Estamp",
"user_id": 1,
"clicks": 7
},
{
"_index": "documents",
"_type": "document",
"_id": "14",
"_score": 0.29015404,
"_source": {
"id": 14,
"name": "Notary and Estamp",
"service": "Notary",
"user_id": 1,
"clicks": 12
}
}
]
此处文档的分数未根据点击次数更新
在没有看到您的索引数据的情况下很难回答。但是看着查询,我想到了一件事,我将用简短的例子来展示它:
示例 1:
我已将以下文档编入索引:
{"name":"Franking and Estampy", "service" :"text", "clicks": 5}
{"name":"Notary and Estamp", "service" :"text", "clicks": 8}
运行 您提供的相同查询给出了这个结果:
"hits": {
"total": 2,
"max_score": 4.333119,
"hits": [
{
"_index": "script",
"_type": "test",
"_id": "AV2iwkems7jEvHyvnccV",
"_score": 4.333119,
"_source": {
"name": "Notary and Estamp",
"service": "text",
"clicks": 8
}
},
{
"_index": "script",
"_type": "test",
"_id": "AV2iwo6ds7jEvHyvnccW",
"_score": 3.6673431,
"_source": {
"name": "Franking and Estampy",
"service": "text",
"clicks": 5
}
}
]
}
所以一切都很好 - 点击 8 次的文档得分更高(_score
字段值)并且顺序正确。
示例 2:
我注意到在您的查询中 name
字段被提升为高因子。那么,如果我将以下数据编入索引会怎样?
{"name":"Franking and Estampy", "service" :"text", "clicks": 5}
{"name":"text", "service" :"Notary and Estamp", "clicks": 8}
结果:
"hits": {
"total": 2,
"max_score": 13.647502,
"hits": [
{
"_index": "script",
"_type": "test",
"_id": "AV2iwo6ds7jEvHyvnccW",
"_score": 13.647502,
"_source": {
"name": "Franking and Estampy",
"service": "text",
"clicks": 5
}
},
{
"_index": "script",
"_type": "test",
"_id": "AV2iwkems7jEvHyvnccV",
"_score": 1.5597181,
"_source": {
"name": "text",
"service": "Notary and Estamp",
"clicks": 8
}
}
]
}
虽然Franking and Estampy
只有5次点击,但比第二个点击次数多的文档得分高很多。
所以重点是,在您的查询中,点击次数并不是影响文档评分和最终顺序的唯一因素。没有真实数据,它是只是我的猜测。您可以 运行 使用一些 REST 客户端自己查询并检查 scoring/field/matching 个短语。
更新
根据您的搜索结果 - 您可以看到带有 id=13
的文档在两个字段(name
和 service
)中都有 Estamp
个术语。这就是该文档获得更高评分的原因(这意味着在计算评分的算法中,在两个字段中都有该术语比具有更高的点击次数更重要)。如果您希望 clicks
字段对得分产生更大的影响,请尝试使用 factor
(可能应该更高)和 modifier
("modifier": "square"
可能适用于您的情况) .您可以检查可能的值 here.
试试这个组合:
{
"query": {
"function_score": {
... // same as before
},
"field_value_factor": {
"field": "clicks" ,
"modifier": "square",
"factor": 3
}
}
}
}
更新 2 - 仅根据点击次数评分
如果唯一应该对评分有影响的参数应该是clicks
字段中的值,你可以尝试使用"boost_mode": "replace"
——在这种情况下只使用函数评分,查询分数被忽略。因此 Estamp
项在 name
和 service
字段中的出现频率不会影响评分。试试这个查询:
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "Estamp",
"fields": [ "name", "service"],
"fuzziness": "AUTO"
}
},
"field_value_factor": {
"field": "clicks",
"factor": 1
},
"boost_mode": "replace"
}
}
}
它给了我:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 5,
"hits": [
{
"_index": "script",
"_type": "test",
"_id": "AV2nI0HkJPYn0YKQxRvd",
"_score": 5,
"_source": {
"name": "Notary and Estamp",
"service": "Notary",
"clicks": 5
}
},
{
"_index": "script",
"_type": "test",
"_id": "AV2nIwKvJPYn0YKQxRvc",
"_score": 4,
"_source": {
"name": "Franking and Estamp",
"service": "Estamp",
"clicks": 4
}
}
]
}
}
这可能是您要找的那个(请注意值 "_score": 5
和 "_score": 4
与点击次数匹配)。