同义词,将权重存储在文档中以在 Elastic Search 中进行相关性评分

Synonyms, storing weights in document for relevance scoring in Elastic Search

故事:给定下面的示例文档并通过扩展它们,是否有可能获得以下排名:

执行搜索的文档:

[{
  name: "Cornflakes"
},
{
  name: "Basmati"
},
{
  name: "Rice Krispies"
}]

当然,其中一些甚至没有搜索词,所以一个选项是添加一个带有文本值和权重的同义词数组,这将有助于计算排名:

[{
  name: "Cornflakes",
  synonyms: [
    {t: 'Cereals', weight: 100},
    {t: 'Sugar', weight: 100}]
},
{
  name: "Basmati",
  synonyms: [
    {t: 'Cereals', weight: 1},
    {t: 'Rice', weight: 1000}]
},
{
  name: "Rice Krispies",
  synonyms: [
    {t: 'Cereals', weight: 10},
    {t: 'Rice', weight: 1}]
}]

这是正确的方法吗?

考虑加权同义词的 Elastic Search 查询是什么?

我认为 "tags" 比 "synonyms" 更适合该字段的名称。 您可以使用 nested type to store tags and use function scoretags.weight 字段(最佳匹配标签的值,如果有的话)的值与 名称字段。

一个这样的实现可能如下所示:

put test

put test/tag_doc/_mapping
{
    "properties" : {
        "tags" : { 
            "type" : "nested" ,
            "properties": {
                "t" : {"type" : "string"},
                "weight" : {"type" : "double"}
             }

        }   
    }
}

put test/tag_doc/_bulk
{ "index" : { "_index" : "test", "_type" : "tag_doc", "_id":1} }
{"name": "Cornflakes","tags": [{"t": "Cereals", "weight":100},{"t": "Sugar", "weight": 100}]}
{ "index" : { "_index" : "test", "_type" : "tag_doc","_id":2} }
{ "name": "Basmati","tags": [{"t": "Cereals", "weight": 1},{"t": "Rice", "weight": 1000}]}
{ "index" : { "_index" : "test", "_type" : "tag_doc","_id":3} }
{ "name": "Rice Krispies", "tags": [{"t": "Cereals", "weight": 10},{"t": "Rice", "weight": 1}]}


post test/_search
{
   "query": {
      "dis_max": {
         "queries": [
            {
               "match": {
                  "name": {
                     "query": "cereals",
                     "boost": 100
                  }
               }
            },
            {
               "nested": {
                  "path": "tags",
                  "query": {
                     "function_score": {
                        "functions": [
                           {
                              "field_value_factor": {
                                 "field": "tags.weight"
                              }
                           }
                        ],
                        "query": {
                           "match": {
                              "tags.t": "cereals"
                           }
                        },
                        "boost_mode": "replace",
                        "score_mode": "max"
                     }
                  },
                  "score_mode": "max"
               }
            }
         ]
      }
   }
}

结果 :

"hits": {
      "total": 3,
      "max_score": 100,
      "hits": [
         {
            "_index": "test",
            "_type": "tag_doc",
            "_id": "1",
            "_score": 100,
            "_source": {
               "name": "Cornflakes",
               "tags": [
                  {
                     "t": "Cereals",
                     "weight": 100
                  },
                  {
                     "t": "Sugar",
                     "weight": 100
                  }
               ]
            }
         },
         {
            "_index": "test",
            "_type": "tag_doc",
            "_id": "3",
            "_score": 10,
            "_source": {
               "name": "Rice Krispies",
               "tags": [
                  {
                     "t": "Cereals",
                     "weight": 10
                  },
                  {
                     "t": "Rice",
                     "weight": 1
                  }
               ]
            }
         },
         {
            "_index": "test",
            "_type": "tag_doc",
            "_id": "2",
            "_score": 1,
            "_source": {
               "name": "Basmati",
               "tags": [
                  {
                     "t": "Cereals",
                     "weight": 1
                  },
                  {
                     "t": "Rice",
                     "weight": 1000
                  }
               ]
            }
         }
      ]
   }