Return 地理点数组的 Elasticsearch 距离

Return Elasticsearch distance for array of geo points

我需要 return Elasticsearch 数组中每个文档的 多个 地理点的距离。截至目前,我的结果仅为 return 为阵列计算的一个距离。

我从以下 Whosebug 问题的代码开始: Return distance in elasticsearch results?

我的 elasticsearch 查询正文包含这个:

{
  "stored_fields" : [ "_source" ],
    "script_fields" : {
      "distance" : {
        "script" : {
          "inline": "doc['locations.facility.address.coordinates'].arcDistance(params.lat,params.lon) * 0.001",
          "lang": "painless",
          "params": {
            "lat": 2.27,
            "lon": 50.3
          }
        }
      }
    }
  }

而且,我的 Elasticsearch 源文档在 returned 时类似于这样。 (注意locations是一个数组。)

"locations": [
    {
      "facility": {
        "address": {
          "country_code": "US",
          "city": "San Diego",
          "coordinates": {
            "lon": -117.165,
            "lat": 32.8408
          },
          "country_name": "United States",
          "state_province": "California",
          "postal_code": "92123"
        }
      }
    },
    {
      "facility": {
        "address": {
          "country_code": "US",
          "city": "Tampa",
          "coordinates": {
            "lon": -82.505,
            "lat": 28.0831
          },
          "country_name": "United States",
          "state_province": "Florida",
          "postal_code": "33613"
        }
      }
    }

]

目前,我的结果 return 与此类似:

    "fields": {
      "distance": [
        13952.518249603361
      ]
    }

但是在距离数组中,我需要 return 'locations' 中每个 条目的值 'locations'。

这个很棘手。

根据这些文档值的基础 documentation and the source code, the arcDistance method is only available on the doc values, not on the individual geo point instances

换句话说,虽然我们可以在 doc['locations.facility.address.coordinates'] 上进行迭代,但被迭代者并没有实现 any geo distance methods.

真可惜。所以我们必须实现我们自己的地理距离函数,也许 using the haversine formula:

{
  "stored_fields": [
    "_source"
  ],
  "script_fields": {
    "distance": {
      "script": {
        "inline": """
          float distFrom(float lat1, float lng1, float lat2, float lng2) {
            double earthRadius = 6371000; // meters
            double dLat = Math.toRadians(lat2-lat1);
            double dLng = Math.toRadians(lng2-lng1);
            double a = Math.sin(dLat/2) * Math.sin(dLat/2) +
                       Math.cos(Math.toRadians(lat1)) * Math.cos(Math.toRadians(lat2)) *
                       Math.sin(dLng/2) * Math.sin(dLng/2);
            double c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1-a));
            float dist = (float) (earthRadius * c);
            
            return dist;
          }
        
          return params._source.locations.stream().map(location -> {
              def lat = (float) location.facility.address.coordinates.lat;
              def lon = (float) location.facility.address.coordinates.lon;
              return distFrom(lat, lon, (float) params.lat, (float) params.lon) * 0.001;
          }).collect(Collectors.toList())
        """,
        "lang": "painless",
        "params": {
          "lat": 2.27,
          "lon": 50.3
        }
      }
    }
  }
}

屈服

"hits" : {
  ...
  "hits" : [
    {
      ...
      "_source" : {
        "locations" : [
          { ... },
          { ... }
        ]
      },
      "fields" : {
        "distance" : [
          15894.470000000001,
          13952.498
        ]
      }
    }
  ]
}

老实说,当需要如此多的脚本编写工作时,出了点问题

一般而言,脚本 should be avoided

但更重要的是,当您不按这些地理距离排序时,整个计算工作应该在 Elasticsearch 外部 完成-- 而是在您 post 处理搜索结果的地方。例如,我使用 Turf 进行 javascript 地理计算。

最后,当您在一个数组中存储多个 locations/facilities 时,我建议使用 nested fields. They prevent array flattening, plus support