使用术语的 Elasticsearch 空间搜索

Question

我正在运行查询 140 多万份具有空间数据的文档。纯空间查询非常快（低于 1 秒）。向相同几何体添加通配符会导致查询耗时约 10-20 秒。我预计通配符查询需要一些时间，但我想知道是否有更好的方法来编写查询或欺骗 Elasticsearch 将结果过滤为仅几何图形，然后找到通配符匹配项。或者，也许运行空间查询然后运行结果文档 ID 上的通配符？任何可能为最终用户带来更快结果的想法都将不胜感激。

GET parcels/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "wildcard": {
            "name.keyword": {
              "value": "*smith*"
            }
          }
        },
        {
          "bool": {
            "filter": [
              {
                "geo_shape": {
                  "shape": {
                    "shape": {
                      "type": "POLYGON",
                      "coordinates": [
                        [
                          [
                            -81.09980486601305,
                            32.063655184739936
                          ],
                          [
                            -81.09980486601168,
                            32.05639855631687
                          ],
                          [
                            -81.09128330779276,
                            32.05639855631687
                          ],
                          [
                            -81.09128330779276,
                            32.06365489826756
                          ],
                          [
                            -81.09980486601305,
                            32.063655184739936
                          ]
                        ]
                      ]
                    },
                    "relation": "intersects"
                  }
                }
              }
            ]
          }
        }
      ]
    }
  },
  "size": 10000
}

我们的索引设置：

{
...
"analysis": {
    "normalizer": {
        "search_normalizer": {
            "filter": [
                "uppercase"
            ],
            "type": "custom"
         }
     }
},
"number_of_shards": 8,
"number_of_replicas": 1,

'name' 字段的映射：

"name": {
    "type": "text",
    "fields": {
        "keyword": {
            "type": "keyword",
            "normalizer": "search_normalizer"
        }
    }
},

运行 ES 7.10。（5 个节点，每个节点具有 8GB RAM）

不按通配符搜索不是一个选项。

感谢任何帮助。

Answer 1

在 keyword 字段上使用带前缀通配符（如 *smith*）的通配符搜索会降低性能！

如果您绝对需要这种功能，则需要利用新的 wildcard field type，它正是为这种用途而设计的。

因此您可以添加另一个子字段或将 keyword 子字段更改为 ?wildcard` 子字段。

您可以在 blog article 中看到它是如何工作的，在它出现时描述了通配符字段。

使用术语的 Elasticsearch 空间搜索

Elasticsearch spatial search with terms

spatial-index

elasticsearch