你能使 Elasticsearch 7.x 空间查询更精确吗?
Can you make Elasticsearch 7.x spatial queries more precise?
背景:
我们有大约 1.4 亿个多边形,分为 5 个索引(region-[1-5]),每个索引有 2 个分片。它加载了 ES 7.10。包含多边形的字段名为 'shape' 并映射为 geo_shape 字段。
这是一个索引示例:
"shape": {
"type": "Polygon",
"coordinates": [
[
[
-80.661103428642,
28.0213473946004
],
[
-80.6611091545036,
28.0210035893407
],
[
-80.6615120749597,
28.021009053184
],
[
-80.6615063490981,
28.0213528568402
],
[
-80.661103428642,
28.0213473946004
]
]
]
},
我们的问题发生在查询与给定(通常是手绘)形状相交的多边形时。例如:
GET region_parcels*/_search
{
"query": {
"bool": {
"filter": {
"geo_shape": {
"shape": {
"shape": {
"type": "POLYGON",
"coordinates": [
[
[
-81.0864386380646,
32.07339101099513
],
[
-81.0890350163911,
32.07282734995984
],
[
-81.08907793173533,
32.07190002908301
],
[
-81.08796213278512,
32.07151818834138
],
[
-81.08648155340886,
32.071481822473295
],
[
-81.08459327826233,
32.07231823378
],
[
-81.0841426671478,
32.073136454828834
],
[
-81.08480785498352,
32.073645566452704
],
[
-81.08527992377016,
32.07390012120158
],
[
-81.08530138144226,
32.07390012120158
],
[
-81.0864386380646,
32.07339101099513
]
]
]
},
"relation": "intersects"
}
}
}
}
},
"size": 1000
}
当我们 运行 上面的查询时,我们得到了一些在绘制的多边形之外最多 30 英尺的结果。误报不是统一的(我们不能只是负缓冲我们的搜索多边形到 return 正确的交叉点)。我们还在我们的索引多边形 1 的中间放置了一个点作为搜索几何,并得到了相交的多边形以及一些周围的多边形。
阅读文档和博客,似乎指定任何类型的精度仍然可用,但很快就会被弃用,用于索引的新曲面细分技术应该精确到开箱即用的几毫米.
是否有任何方法可以设置 index/cluster 或以我们忽略的不同方式执行查询以使空间交集查询更准确?
谢谢。
编辑
这是一个实际示例,其中一个多边形的中心有一个点。它 returns 3 命中,相交的一个(正确的)和相交的任一侧的一个(不正确的)。:
要求:
GET region_parcels*/_search
{
"query": {
"bool": {
"filter": {
"geo_shape": {
"shape": {
"shape": {
"type": "POINT",
"coordinates": [
-81.08111523359743,
32.04772418111284
]
},
"relation": "intersects"
}
}
}
}
},
"_source": ["shape"],
"explain": true,
"size": 1000
}
回复:
{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 10,
"successful" : 10,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 0.0,
"hits" : [
{
"_shard" : "[<my_index>][0]",
"_node" : "lrSfQEyVTWmWU828O6Qdsw",
"_index" : "<my_index>",
"_type" : "_doc",
"_id" : "cY2O9XcBlBVQyhnplhLN",
"_score" : 0.0,
"_source" : {
"shape" : {
"coordinates" : [
[
[
-81.0810247260436,
32.0478338967803
],
[
-81.0811253535251,
32.0475727349866
],
[
-81.0812173428069,
32.0475984458201
],
[
-81.0811167162237,
32.0478596090633
],
[
-81.0810247260436,
32.0478338967803
]
]
],
"type" : "Polygon"
}
},
"_explanation" : {
"value" : 0.0,
"description" : "ConstantScore(IntersectsPrefixTreeQuery(fieldName=shape,queryShape=Pt(x=-81.08111523359743,y=32.04772418111284),detailLevel=21,prefixGridScanLevel=20))^0.0",
"details" : [ ]
}
},
{
"_shard" : "<my_index>[0]",
"_node" : "lrSfQEyVTWmWU828O6Qdsw",
"_index" : "<my_index>",
"_type" : "_doc",
"_id" : "dI2O9XcBlBVQyhnplhLN",
"_score" : 0.0,
"_source" : {
"shape" : {
"coordinates" : [
[
[
-81.0809327358636,
32.0478081852515
],
[
-81.0810333624468,
32.0475470233845
],
[
-81.0811253535251,
32.0475727349866
],
[
-81.0810247260436,
32.0478338967803
],
[
-81.0809327358636,
32.0478081852515
]
]
],
"type" : "Polygon"
}
},
"_explanation" : {
"value" : 0.0,
"description" : "ConstantScore(IntersectsPrefixTreeQuery(fieldName=shape,queryShape=Pt(x=-81.08111523359743,y=32.04772418111284),detailLevel=21,prefixGridScanLevel=20))^0.0",
"details" : [ ]
}
},
{
"_shard" : "[<my_index>][1]",
"_node" : "8jO4hXBuQL-cGobekTsjwg",
"_index" : "<my_index>",
"_type" : "_doc",
"_id" : "cI2O9XcBlBVQyhnplhLN",
"_score" : 0.0,
"_source" : {
"shape" : {
"coordinates" : [
[
[
-81.0811167162237,
32.0478596090633
],
[
-81.0812173428069,
32.0475984458201
],
[
-81.0813093320886,
32.0476241574079
],
[
-81.0812087064037,
32.0478853205776
],
[
-81.0811167162237,
32.0478596090633
]
]
],
"type" : "Polygon"
}
},
"_explanation" : {
"value" : 0.0,
"description" : "ConstantScore(IntersectsPrefixTreeQuery(fieldName=shape,queryShape=Pt(x=-81.08111523359743,y=32.04772418111284),detailLevel=21,prefixGridScanLevel=20))^0.0",
"details" : [ ]
}
}
]
}
}
enter code here
事实证明,我们的形状字段的映射已明确设置为 属性 strategy: "recursive"
。
当我们为字段创建组件模板映射时,我们在 Kibana 中将 'advanced settings' 下的 'ignore malformed' 设置为 true。每当我们将数据加载到索引中时,它都会自动使用旧的树结构。这一定是一个错误,因为您不会期望设置其中一个高级设置会设置树类型。我能够使用新的映射和索引复制该行为。
因为我们想保留 'ignore malformed' 选项,所以我通过加载 json 重新创建映射:
"shape": {
"type": "geo_shape",
"ignore_malformed": true,
}
这保留了我们的选择,当我们将数据加载到索引时,它使用的是默认树。我们能够通过 运行 我们以前的搜索来证实这一点,这些搜索现在非常准确(英寸,如果不是更多的话)。
背景: 我们有大约 1.4 亿个多边形,分为 5 个索引(region-[1-5]),每个索引有 2 个分片。它加载了 ES 7.10。包含多边形的字段名为 'shape' 并映射为 geo_shape 字段。
这是一个索引示例:
"shape": {
"type": "Polygon",
"coordinates": [
[
[
-80.661103428642,
28.0213473946004
],
[
-80.6611091545036,
28.0210035893407
],
[
-80.6615120749597,
28.021009053184
],
[
-80.6615063490981,
28.0213528568402
],
[
-80.661103428642,
28.0213473946004
]
]
]
},
我们的问题发生在查询与给定(通常是手绘)形状相交的多边形时。例如:
GET region_parcels*/_search
{
"query": {
"bool": {
"filter": {
"geo_shape": {
"shape": {
"shape": {
"type": "POLYGON",
"coordinates": [
[
[
-81.0864386380646,
32.07339101099513
],
[
-81.0890350163911,
32.07282734995984
],
[
-81.08907793173533,
32.07190002908301
],
[
-81.08796213278512,
32.07151818834138
],
[
-81.08648155340886,
32.071481822473295
],
[
-81.08459327826233,
32.07231823378
],
[
-81.0841426671478,
32.073136454828834
],
[
-81.08480785498352,
32.073645566452704
],
[
-81.08527992377016,
32.07390012120158
],
[
-81.08530138144226,
32.07390012120158
],
[
-81.0864386380646,
32.07339101099513
]
]
]
},
"relation": "intersects"
}
}
}
}
},
"size": 1000
}
当我们 运行 上面的查询时,我们得到了一些在绘制的多边形之外最多 30 英尺的结果。误报不是统一的(我们不能只是负缓冲我们的搜索多边形到 return 正确的交叉点)。我们还在我们的索引多边形 1 的中间放置了一个点作为搜索几何,并得到了相交的多边形以及一些周围的多边形。
阅读文档和博客,似乎指定任何类型的精度仍然可用,但很快就会被弃用,用于索引的新曲面细分技术应该精确到开箱即用的几毫米.
是否有任何方法可以设置 index/cluster 或以我们忽略的不同方式执行查询以使空间交集查询更准确?
谢谢。
编辑
这是一个实际示例,其中一个多边形的中心有一个点。它 returns 3 命中,相交的一个(正确的)和相交的任一侧的一个(不正确的)。: 要求:
GET region_parcels*/_search
{
"query": {
"bool": {
"filter": {
"geo_shape": {
"shape": {
"shape": {
"type": "POINT",
"coordinates": [
-81.08111523359743,
32.04772418111284
]
},
"relation": "intersects"
}
}
}
}
},
"_source": ["shape"],
"explain": true,
"size": 1000
}
回复:
{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 10,
"successful" : 10,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 0.0,
"hits" : [
{
"_shard" : "[<my_index>][0]",
"_node" : "lrSfQEyVTWmWU828O6Qdsw",
"_index" : "<my_index>",
"_type" : "_doc",
"_id" : "cY2O9XcBlBVQyhnplhLN",
"_score" : 0.0,
"_source" : {
"shape" : {
"coordinates" : [
[
[
-81.0810247260436,
32.0478338967803
],
[
-81.0811253535251,
32.0475727349866
],
[
-81.0812173428069,
32.0475984458201
],
[
-81.0811167162237,
32.0478596090633
],
[
-81.0810247260436,
32.0478338967803
]
]
],
"type" : "Polygon"
}
},
"_explanation" : {
"value" : 0.0,
"description" : "ConstantScore(IntersectsPrefixTreeQuery(fieldName=shape,queryShape=Pt(x=-81.08111523359743,y=32.04772418111284),detailLevel=21,prefixGridScanLevel=20))^0.0",
"details" : [ ]
}
},
{
"_shard" : "<my_index>[0]",
"_node" : "lrSfQEyVTWmWU828O6Qdsw",
"_index" : "<my_index>",
"_type" : "_doc",
"_id" : "dI2O9XcBlBVQyhnplhLN",
"_score" : 0.0,
"_source" : {
"shape" : {
"coordinates" : [
[
[
-81.0809327358636,
32.0478081852515
],
[
-81.0810333624468,
32.0475470233845
],
[
-81.0811253535251,
32.0475727349866
],
[
-81.0810247260436,
32.0478338967803
],
[
-81.0809327358636,
32.0478081852515
]
]
],
"type" : "Polygon"
}
},
"_explanation" : {
"value" : 0.0,
"description" : "ConstantScore(IntersectsPrefixTreeQuery(fieldName=shape,queryShape=Pt(x=-81.08111523359743,y=32.04772418111284),detailLevel=21,prefixGridScanLevel=20))^0.0",
"details" : [ ]
}
},
{
"_shard" : "[<my_index>][1]",
"_node" : "8jO4hXBuQL-cGobekTsjwg",
"_index" : "<my_index>",
"_type" : "_doc",
"_id" : "cI2O9XcBlBVQyhnplhLN",
"_score" : 0.0,
"_source" : {
"shape" : {
"coordinates" : [
[
[
-81.0811167162237,
32.0478596090633
],
[
-81.0812173428069,
32.0475984458201
],
[
-81.0813093320886,
32.0476241574079
],
[
-81.0812087064037,
32.0478853205776
],
[
-81.0811167162237,
32.0478596090633
]
]
],
"type" : "Polygon"
}
},
"_explanation" : {
"value" : 0.0,
"description" : "ConstantScore(IntersectsPrefixTreeQuery(fieldName=shape,queryShape=Pt(x=-81.08111523359743,y=32.04772418111284),detailLevel=21,prefixGridScanLevel=20))^0.0",
"details" : [ ]
}
}
]
}
}
enter code here
事实证明,我们的形状字段的映射已明确设置为 属性 strategy: "recursive"
。
当我们为字段创建组件模板映射时,我们在 Kibana 中将 'advanced settings' 下的 'ignore malformed' 设置为 true。每当我们将数据加载到索引中时,它都会自动使用旧的树结构。这一定是一个错误,因为您不会期望设置其中一个高级设置会设置树类型。我能够使用新的映射和索引复制该行为。
因为我们想保留 'ignore malformed' 选项,所以我通过加载 json 重新创建映射:
"shape": {
"type": "geo_shape",
"ignore_malformed": true,
}
这保留了我们的选择,当我们将数据加载到索引时,它使用的是默认树。我们能够通过 运行 我们以前的搜索来证实这一点,这些搜索现在非常准确(英寸,如果不是更多的话)。