Elastic Search：有什么方法可以使逗号分隔列表中 space 分隔的单词被视为一个术语？

Question

我不知道这是否可行，但我正在尝试使用 "exact search" 选项按位置搜索。搜索了几个字段，最重要的是 "location_raw" 字段：

"match": {
    "location.location_raw": {
        "type": "boolean",
        "operator": "AND",
        "query": "[location query]",
        "analyzer": "standard"
     }
}

location_raw字段是一个位置字符串，每个地方之间用逗号分隔，例如"Sudbury, Middlesex, Massachusetts"或"Leamington, Warwickshire, England"。如果有人搜索 "Sudbury, Middlesex"，它将作为

传入

"query": "Sudbury Middlesex"

并且这两个术语都必须存在于 location_raw 字段中。这部分有效。

问题是，当 location_raw 字段包含多词位置时，例如 New York 或 Saint George，当有人搜索 "York" 或 "George." 时，这些会返回，如果我精确搜索 "George," 我不想得到 "Saint George." 的结果有什么方法可以让 Elastic 考虑 "Saint George" 字符串 "Saint George, Stamford, Lincoln, England" 中的一个词吗？

Answer 1

这是一种方法，但您也必须在 csv 中查询，或者使用 terms filter。

我使用了带有简单模式的 pattern analyzer：", "。我用单个文档设置了一个简单的索引：

PUT /test_index
{
   "settings": {
      "number_of_shards": 1,
      "analysis": {
         "analyzer": {
            "csv": {
               "type": "pattern",
               "pattern": ", ",
               "lowercase": false
            }
         }
      }
   },
   "mappings": {
      "doc": {
         "properties": {
            "location": {
               "type": "string",
               "index_analyzer": "csv",
               "search_analyzer": "standard",
               "fields": {
                  "raw": {
                     "type": "string",
                     "index": "not_analyzed"
                  }
               }
            }
         }
      }
   }
}

POST /test_index/_bulk
{"index":{"_index":"test_index","_type":"doc","_id":1}}
{"location":"Saint George, Stamford, Lincoln, England"}

我可以看到使用简单 terms aggregation:

生成的术语

POST /test_index/_search?search_type=count
{
   "aggs": {
      "location_terms": {
         "terms": {
            "field": "location"
         }
      }
   }
}
...
{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "location_terms": {
         "buckets": [
            {
               "key": "England",
               "doc_count": 1
            },
            {
               "key": "Lincoln",
               "doc_count": 1
            },
            {
               "key": "Saint George",
               "doc_count": 1
            },
            {
               "key": "Stamford",
               "doc_count": 1
            }
         ]
      }
   }
}

然后，如果我使用相同的 csv 语法查询，则不会为 "George, England":

返回文档

POST /test_index/_search
{
   "query": {
      "match": {
         "location": {
            "type": "boolean",
            "operator": "AND",
            "query": "George, England",
            "analyzer": "csv"
         }
      }
   }
}
...
{
   "took": 0,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 0,
      "max_score": null,
      "hits": []
   }
}

但适用于 "Saint George, England":

POST /test_index/_search
{
   "query": {
      "match": {
         "location": {
            "type": "boolean",
            "operator": "AND",
            "query": "Saint George, England",
            "analyzer": "csv"
         }
      }
   }
}
...
{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.2169777,
      "hits": [
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "1",
            "_score": 0.2169777,
            "_source": {
               "location": "Saint George, Stamford, Lincoln, England"
            }
         }
      ]
   }
}

这个查询是等效的，而且可能性能更高：

POST /test_index/_search
{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "terms": {
               "location": [
                  "Saint George",
                  "England"
               ],
               "execution": "and"
            }
         }
      }
   }
}

这是我用来测试它的代码：

http://sense.qbox.io/gist/234ea93accb7b20ad8fd33e62fe92f1d450a51ab

Elastic Search：有什么方法可以使逗号分隔列表中 space 分隔的单词被视为一个术语？

Elastic Search: Any way to make space-separated words in a comma-separated list regarded as one term?

match

elasticsearch