为 cross_fields 构建具有模糊性的有效 Elasticsearch 查询

Building an effective Elasticsearch query for cross_fields with fuzziness

我知道 Elasticsearch 不支持 multi_match 查询中 cross_fields 类型的模糊性。我在使用 Elasticsearch API 时遇到了很多困难,因此我发现构建一个类似的查询具有挑战性,该查询使用模糊字符串匹配在多个文档字段中进行搜索。

我有一个名为 papers 的索引,其中包含各种字段,例如 TitleAuthor.FirstNameAuthor.LastNamePublicationDateJournal 等...我希望能够使用 "John Doe paper title 2015 journal name" 之类的字符串进行查询。 cross_fields 是完美的 multi_match 类型,但它不支持对我的应用程序至关重要的模糊性。

任何人都可以提出一个合理的方法来解决这个问题吗?我花了几个小时在 SO 和 Elasticsearch 论坛上浏览解决方案,但收效甚微。

你可以利用copy_to field for this scenario. Basically you are copying all the values from different fields into one new field (my_search_field in the below details) and on this field, you would be able to perform fuzzy query via fuzziness parameter using simple match query

下面是示例映射、文档和查询:

映射:

PUT my_fuzzy_index
{
  "mappings": {
    "properties": {
      "my_search_field":{                    <---- Note this field
        "type": "text"
      },
      "Title":{
        "type": "text",
        "copy_to": "my_search_field"         <---- Note this 
      },
      "Author":{
        "type": "nested",
        "properties": {
          "FirstName":{
            "type":"text",
            "copy_to": "my_search_field"     <---- Note this 
          },
          "LastName":{
            "type":"text",
            "copy_to": "my_search_field"     <---- Note this 
          }
        }
      },
      "PublicationDate":{
        "type": "date",
        "copy_to": "my_search_field"        <---- Note this 
      },
      "Journal":{
        "type":"text",
        "copy_to": "my_search_field"        <---- Note this 
      }
    }
  }
}

示例文档:

POST my_fuzzy_index/_doc/1
{
  "Title": "Fountainhead",
  "Author":[
    {
      "FirstName": "Ayn",
      "LastName": "Rand"
    }
  ],
  "PublicationDate": "2015",
  "Journal": "journal"
}

查询请求:

POST my_fuzzy_index/_search
{
  "query": {
    "match": {
      "my_search_field": {                                  <---- Note this field
        "query": "Aynnn Ranaad Fountainhead 2015 journal",
        "fuzziness": 3                                      <---- Fuzzy parameter
      }
    }
  }
}

回复:

{
  "took" : 15,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.1027813,
    "hits" : [
      {
        "_index" : "my_fuzzy_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.1027813,
        "_source" : {
          "Title" : "Fountainhead",
          "Author" : [
            {
              "FirstName" : "Ayn",
              "LastName" : "Rand"
            }
          ],
          "PublicationDate" : "2015",
          "Journal" : "journal"
        }
      }
    ]
  }
}

因此,与其考虑对多个字段应用模糊查询,不如采用这种方法。这样你的查询就会被简化。

如果有帮助请告诉我!