同义词,将权重存储在文档中以在 Elastic Search 中进行相关性评分
Synonyms, storing weights in document for relevance scoring in Elastic Search
故事:给定下面的示例文档并通过扩展它们,是否有可能获得以下排名:
"Cereals" 的搜索结果如下
- 玉米片
- 脆米饼
在 "Rice" 上的搜索结果如下
- 印度香米
- 脆米饼
执行搜索的文档:
[{
name: "Cornflakes"
},
{
name: "Basmati"
},
{
name: "Rice Krispies"
}]
当然,其中一些甚至没有搜索词,所以一个选项是添加一个带有文本值和权重的同义词数组,这将有助于计算排名:
[{
name: "Cornflakes",
synonyms: [
{t: 'Cereals', weight: 100},
{t: 'Sugar', weight: 100}]
},
{
name: "Basmati",
synonyms: [
{t: 'Cereals', weight: 1},
{t: 'Rice', weight: 1000}]
},
{
name: "Rice Krispies",
synonyms: [
{t: 'Cereals', weight: 10},
{t: 'Rice', weight: 1}]
}]
这是正确的方法吗?
考虑加权同义词的 Elastic Search 查询是什么?
我认为 "tags" 比 "synonyms" 更适合该字段的名称。
您可以使用 nested type to store tags and use function score 将 tags.weight 字段(最佳匹配标签的值,如果有的话)的值与 名称字段。
一个这样的实现可能如下所示:
put test
put test/tag_doc/_mapping
{
"properties" : {
"tags" : {
"type" : "nested" ,
"properties": {
"t" : {"type" : "string"},
"weight" : {"type" : "double"}
}
}
}
}
put test/tag_doc/_bulk
{ "index" : { "_index" : "test", "_type" : "tag_doc", "_id":1} }
{"name": "Cornflakes","tags": [{"t": "Cereals", "weight":100},{"t": "Sugar", "weight": 100}]}
{ "index" : { "_index" : "test", "_type" : "tag_doc","_id":2} }
{ "name": "Basmati","tags": [{"t": "Cereals", "weight": 1},{"t": "Rice", "weight": 1000}]}
{ "index" : { "_index" : "test", "_type" : "tag_doc","_id":3} }
{ "name": "Rice Krispies", "tags": [{"t": "Cereals", "weight": 10},{"t": "Rice", "weight": 1}]}
post test/_search
{
"query": {
"dis_max": {
"queries": [
{
"match": {
"name": {
"query": "cereals",
"boost": 100
}
}
},
{
"nested": {
"path": "tags",
"query": {
"function_score": {
"functions": [
{
"field_value_factor": {
"field": "tags.weight"
}
}
],
"query": {
"match": {
"tags.t": "cereals"
}
},
"boost_mode": "replace",
"score_mode": "max"
}
},
"score_mode": "max"
}
}
]
}
}
}
结果 :
"hits": {
"total": 3,
"max_score": 100,
"hits": [
{
"_index": "test",
"_type": "tag_doc",
"_id": "1",
"_score": 100,
"_source": {
"name": "Cornflakes",
"tags": [
{
"t": "Cereals",
"weight": 100
},
{
"t": "Sugar",
"weight": 100
}
]
}
},
{
"_index": "test",
"_type": "tag_doc",
"_id": "3",
"_score": 10,
"_source": {
"name": "Rice Krispies",
"tags": [
{
"t": "Cereals",
"weight": 10
},
{
"t": "Rice",
"weight": 1
}
]
}
},
{
"_index": "test",
"_type": "tag_doc",
"_id": "2",
"_score": 1,
"_source": {
"name": "Basmati",
"tags": [
{
"t": "Cereals",
"weight": 1
},
{
"t": "Rice",
"weight": 1000
}
]
}
}
]
}
故事:给定下面的示例文档并通过扩展它们,是否有可能获得以下排名:
"Cereals" 的搜索结果如下
- 玉米片
- 脆米饼
在 "Rice" 上的搜索结果如下
- 印度香米
- 脆米饼
执行搜索的文档:
[{
name: "Cornflakes"
},
{
name: "Basmati"
},
{
name: "Rice Krispies"
}]
当然,其中一些甚至没有搜索词,所以一个选项是添加一个带有文本值和权重的同义词数组,这将有助于计算排名:
[{
name: "Cornflakes",
synonyms: [
{t: 'Cereals', weight: 100},
{t: 'Sugar', weight: 100}]
},
{
name: "Basmati",
synonyms: [
{t: 'Cereals', weight: 1},
{t: 'Rice', weight: 1000}]
},
{
name: "Rice Krispies",
synonyms: [
{t: 'Cereals', weight: 10},
{t: 'Rice', weight: 1}]
}]
这是正确的方法吗?
考虑加权同义词的 Elastic Search 查询是什么?
我认为 "tags" 比 "synonyms" 更适合该字段的名称。 您可以使用 nested type to store tags and use function score 将 tags.weight 字段(最佳匹配标签的值,如果有的话)的值与 名称字段。
一个这样的实现可能如下所示:
put test
put test/tag_doc/_mapping
{
"properties" : {
"tags" : {
"type" : "nested" ,
"properties": {
"t" : {"type" : "string"},
"weight" : {"type" : "double"}
}
}
}
}
put test/tag_doc/_bulk
{ "index" : { "_index" : "test", "_type" : "tag_doc", "_id":1} }
{"name": "Cornflakes","tags": [{"t": "Cereals", "weight":100},{"t": "Sugar", "weight": 100}]}
{ "index" : { "_index" : "test", "_type" : "tag_doc","_id":2} }
{ "name": "Basmati","tags": [{"t": "Cereals", "weight": 1},{"t": "Rice", "weight": 1000}]}
{ "index" : { "_index" : "test", "_type" : "tag_doc","_id":3} }
{ "name": "Rice Krispies", "tags": [{"t": "Cereals", "weight": 10},{"t": "Rice", "weight": 1}]}
post test/_search
{
"query": {
"dis_max": {
"queries": [
{
"match": {
"name": {
"query": "cereals",
"boost": 100
}
}
},
{
"nested": {
"path": "tags",
"query": {
"function_score": {
"functions": [
{
"field_value_factor": {
"field": "tags.weight"
}
}
],
"query": {
"match": {
"tags.t": "cereals"
}
},
"boost_mode": "replace",
"score_mode": "max"
}
},
"score_mode": "max"
}
}
]
}
}
}
结果 :
"hits": {
"total": 3,
"max_score": 100,
"hits": [
{
"_index": "test",
"_type": "tag_doc",
"_id": "1",
"_score": 100,
"_source": {
"name": "Cornflakes",
"tags": [
{
"t": "Cereals",
"weight": 100
},
{
"t": "Sugar",
"weight": 100
}
]
}
},
{
"_index": "test",
"_type": "tag_doc",
"_id": "3",
"_score": 10,
"_source": {
"name": "Rice Krispies",
"tags": [
{
"t": "Cereals",
"weight": 10
},
{
"t": "Rice",
"weight": 1
}
]
}
},
{
"_index": "test",
"_type": "tag_doc",
"_id": "2",
"_score": 1,
"_source": {
"name": "Basmati",
"tags": [
{
"t": "Cereals",
"weight": 1
},
{
"t": "Rice",
"weight": 1000
}
]
}
}
]
}