你能使 Elasticsearch 7.x 空间查询更精确吗?

Can you make Elasticsearch 7.x spatial queries more precise?

背景: 我们有大约 1.4 亿个多边形,分为 5 个索引(region-[1-5]),每个索引有 2 个分片。它加载了 ES 7.10。包含多边形的字段名为 'shape' 并映射为 geo_shape 字段。

这是一个索引示例:

"shape": {
  "type": "Polygon",
  "coordinates": [
    [
      [
        -80.661103428642,
        28.0213473946004
      ],
      [
        -80.6611091545036,
        28.0210035893407
      ],
      [
        -80.6615120749597,
        28.021009053184
      ],
      [
        -80.6615063490981,
        28.0213528568402
      ],
      [
        -80.661103428642,
        28.0213473946004
      ]
    ]
  ]
},

我们的问题发生在查询与给定(通常是手绘)形状相交的多边形时。例如:

GET region_parcels*/_search
{
  "query": {
    "bool": {
      "filter": {
        "geo_shape": {
          "shape": {
            "shape": {
              "type": "POLYGON",
              "coordinates": [
                [
                  [
                    -81.0864386380646,
                    32.07339101099513
                  ],
                  [
                    -81.0890350163911,
                    32.07282734995984
                  ],
                  [
                    -81.08907793173533,
                    32.07190002908301
                  ],
                  [
                    -81.08796213278512,
                    32.07151818834138
                  ],
                  [
                    -81.08648155340886,
                    32.071481822473295
                  ],
                  [
                    -81.08459327826233,
                    32.07231823378
                  ],
                  [
                    -81.0841426671478,
                    32.073136454828834
                  ],
                  [
                    -81.08480785498352,
                    32.073645566452704
                  ],
                  [
                    -81.08527992377016,
                    32.07390012120158
                  ],
                  [
                    -81.08530138144226,
                    32.07390012120158
                  ],
                  [
                    -81.0864386380646,
                    32.07339101099513
                  ]
                ]
              ]
            },
            "relation": "intersects"
          }
        }
      }
    }
  },
  "size": 1000
}

当我们 运行 上面的查询时,我们得到了一些在绘制的多边形之外最多 30 英尺的结果。误报不是统一的(我们不能只是负缓冲我们的搜索多边形到 return 正确的交叉点)。我们还在我们的索引多边形 1 的中间放置了一个点作为搜索几何,并得到了相交的多边形以及一些周围的多边形。

阅读文档和博客,似乎指定任何类型的精度仍然可用,但很快就会被弃用,用于索引的新曲面细分技术应该精确到开箱即用的几毫米.

是否有任何方法可以设置 index/cluster 或以我们忽略的不同方式执行查询以使空间交集查询更准确?

谢谢。

编辑

这是一个实际示例,其中一个多边形的中心有一个点。它 returns 3 命中,相交的一个(正确的)和相交的任一侧的一个(不正确的)。: 要求:

GET region_parcels*/_search
{
  "query": {
    "bool": {
      "filter": {
        "geo_shape": {
          "shape": {
            "shape": {
              "type": "POINT",
              "coordinates": [
                -81.08111523359743,
                32.04772418111284
              ]
            },
            "relation": "intersects"
          }
        }
      }
    }
  },
  "_source": ["shape"],
  "explain": true,
  "size": 1000
}

回复:

{
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 10,
    "successful" : 10,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.0,
    "hits" : [
      {
        "_shard" : "[<my_index>][0]",
        "_node" : "lrSfQEyVTWmWU828O6Qdsw",
        "_index" : "<my_index>",
        "_type" : "_doc",
        "_id" : "cY2O9XcBlBVQyhnplhLN",
        "_score" : 0.0,
        "_source" : {
          "shape" : {
            "coordinates" : [
              [
                [
                  -81.0810247260436,
                  32.0478338967803
                ],
                [
                  -81.0811253535251,
                  32.0475727349866
                ],
                [
                  -81.0812173428069,
                  32.0475984458201
                ],
                [
                  -81.0811167162237,
                  32.0478596090633
                ],
                [
                  -81.0810247260436,
                  32.0478338967803
                ]
              ]
            ],
            "type" : "Polygon"
          }
        },
        "_explanation" : {
          "value" : 0.0,
          "description" : "ConstantScore(IntersectsPrefixTreeQuery(fieldName=shape,queryShape=Pt(x=-81.08111523359743,y=32.04772418111284),detailLevel=21,prefixGridScanLevel=20))^0.0",
          "details" : [ ]
        }
      },
      {
        "_shard" : "<my_index>[0]",
        "_node" : "lrSfQEyVTWmWU828O6Qdsw",
        "_index" : "<my_index>",
        "_type" : "_doc",
        "_id" : "dI2O9XcBlBVQyhnplhLN",
        "_score" : 0.0,
        "_source" : {
          "shape" : {
            "coordinates" : [
              [
                [
                  -81.0809327358636,
                  32.0478081852515
                ],
                [
                  -81.0810333624468,
                  32.0475470233845
                ],
                [
                  -81.0811253535251,
                  32.0475727349866
                ],
                [
                  -81.0810247260436,
                  32.0478338967803
                ],
                [
                  -81.0809327358636,
                  32.0478081852515
                ]
              ]
            ],
            "type" : "Polygon"
          }
        },
        "_explanation" : {
          "value" : 0.0,
          "description" : "ConstantScore(IntersectsPrefixTreeQuery(fieldName=shape,queryShape=Pt(x=-81.08111523359743,y=32.04772418111284),detailLevel=21,prefixGridScanLevel=20))^0.0",
          "details" : [ ]
        }
      },
      {
        "_shard" : "[<my_index>][1]",
        "_node" : "8jO4hXBuQL-cGobekTsjwg",
        "_index" : "<my_index>",
        "_type" : "_doc",
        "_id" : "cI2O9XcBlBVQyhnplhLN",
        "_score" : 0.0,
        "_source" : {
          "shape" : {
            "coordinates" : [
              [
                [
                  -81.0811167162237,
                  32.0478596090633
                ],
                [
                  -81.0812173428069,
                  32.0475984458201
                ],
                [
                  -81.0813093320886,
                  32.0476241574079
                ],
                [
                  -81.0812087064037,
                  32.0478853205776
                ],
                [
                  -81.0811167162237,
                  32.0478596090633
                ]
              ]
            ],
            "type" : "Polygon"
          }
        },
        "_explanation" : {
          "value" : 0.0,
          "description" : "ConstantScore(IntersectsPrefixTreeQuery(fieldName=shape,queryShape=Pt(x=-81.08111523359743,y=32.04772418111284),detailLevel=21,prefixGridScanLevel=20))^0.0",
          "details" : [ ]
        }
      }
    ]
  }
}
enter code here

事实证明,我们的形状字段的映射已明确设置为 属性 strategy: "recursive"

当我们为字段创建组件模板映射时,我们在 Kibana 中将 'advanced settings' 下的 'ignore malformed' 设置为 true。每当我们将数据加载到索引中时,它都会自动使用旧的树结构。这一定是一个错误,因为您不会期望设置其中一个高级设置会设置树类型。我能够使用新的映射和索引复制该行为。

因为我们想保留 'ignore malformed' 选项,所以我通过加载 json 重新创建映射:

"shape": {
   "type": "geo_shape",
   "ignore_malformed": true,
}

这保留了我们的选择,当我们将数据加载到索引时,它使用的是默认树。我们能够通过 运行 我们以前的搜索来证实这一点,这些搜索现在非常准确(英寸,如果不是更多的话)。