在脚本排序弹性搜索中使用 NestedPath 不允许访问外部属性

Using NestedPath in Script Sort Elastic Search doesn't allow accessing outer properties

我需要根据脚本中的两个逻辑部分进行排序。对于每个文档,计算最小值(HQ 和办公室距离给定距离)并 returned 进行排序。因为我只需要 return 1 个值,所以我需要结合那些计算总部和给定位置之间的距离以及多个办公室和给定位置之间的距离的脚本。

我尝试将它们结合起来,但 Offices 是嵌套的 属性 而 Headquarter 是非嵌套的 属性。如果我使用“NestedPath”,不知何故我无法访问总部 属性。没有“NestedPath”,我无法使用 Offices 属性。这是映射:

         "offices" : {
            "type" : "nested",
            "properties" : {
              "coordinates" : {
                "type" : "geo_point",
                "fields" : {
                  "raw" : {
                    "type" : "text",
                    "index" : false
                  }
                },
                "ignore_malformed" : true
              },
              "state" : {
                "type" : "text"
              }
            }
          },
        "headquarters" : {
            "properties" : {
              "coordinates" : {
                "type" : "geo_point",
                "fields" : {
                  "raw" : {
                    "type" : "text",
                    "index" : false
                  }
                },
                "ignore_malformed" : true
              },
              "state" : {
                "type" : "text"
              }
            }
          }

这是我试过的脚本:

 "sort": [
    {
      "_script": {
        "nested" : {
          "path" : "offices"
        },
        "order": "asc",
        "script": {
          "lang": "painless",
          "params": {
            "lat": 28.9672,
            "lon": -98.4786
          },
          "source": "def hqDistance = 1000000;if (!doc['headquarters.coordinates'].empty){hqDistance = doc['headquarters.coordinates'].arcDistance(params.lat, params.lon) * 0.000621371;} def officeDistance= doc['offices.coordinates'].arcDistance(params.lat, params.lon) * 0.000621371; if (hqDistance < officeDistance) { return hqDistance; } return officeDistance;"
        },
        "type": "Number"
      }
    }
  ],

当我 运行 脚本时,似乎甚至没有执行总部逻辑,我只根据办公室距离得到结果。

有人可以帮我解决这个问题吗?谢谢。

Nested 字段在单独的上下文中运行,无法从外层访问其内容,反之亦然。

但是,您可以访问文档的

但有一个问题:

  • 看,在 offices 嵌套路径下迭代时,您可以调用 .arcDistance because the coordinates are of type ScriptDocValues.GeoPoint.
  • 但是一旦您访问原始 _source,您将处理一组未优化的 java.util.ArrayLists 和 java.util.HashMaps。

这意味着即使你 can iterate an array list:

...
for (def office : params._source['offices']) {
   // office.coordinates is a trivial HashMap of {lat, lon}!
}

无法直接计算地理距离……

…除非您编写自己的 geoDistance 函数——即 perfectly fine with Painless,但需要在脚本顶部定义它。

虽然不需要重新发明轮子:Calculating distance between two points, using latitude longitude?

示例实现

假设您的文档如下所示:

POST my-index/_doc
{
  "offices": [
    {
      "coordinates": "39.9,-74.92",
      "state": "New Jersey"
    }
  ],
  "headquarters": {
    "coordinates": {
      "lat": 40.7128,
      "lon": -74.006
    },
    "state": "NYC"
  }
}

您的排序脚本可能如下所示:

GET my-index/_search
{
   "sort": [
    {
      "_script": {
        "order": "asc",
        "script": {
          "lang": "painless",
          "params": {
            "lat": 28.9672,
            "lon": -98.4786
          },
          "source": """
            // We can declare functions at the beginning of a Painless script
            // https://www.elastic.co/guide/en/elasticsearch/painless/current/painless-functions.html#painless-functions
            
            double deg2rad(double deg) {
              return (deg * Math.PI / 180.0);
            }
            
            double rad2deg(double rad) {
              return (rad * 180.0 / Math.PI);
            }
            
            // 
            double geoDistanceInMiles(def lat1, def lon1, def lat2, def lon2) {
              double theta = lon1 - lon2;
              double dist = Math.sin(deg2rad(lat1)) * Math.sin(deg2rad(lat2)) + Math.cos(deg2rad(lat1)) * Math.cos(deg2rad(lat2)) * Math.cos(deg2rad(theta));
              dist = Math.acos(dist);
              dist = rad2deg(dist);
              return dist * 60 * 1.1515;
            }

            // start off arbitrarily high            
            def hqDistance = 1000000;

            if (!doc['headquarters.coordinates'].empty) {
              hqDistance = doc['headquarters.coordinates'].arcDistance(params.lat, params.lon) * 0.000621371;
            }
            
            // assume office distance as large as hq distance
            def officeDistance = hqDistance;
            
            // iterate each office and compare it to the currently lowest officeDistance
            for (def office : params._source['offices']) {
              // the coordinates are formatted as "lat,lon" so let's split...
              def latLong = Arrays.asList(office.coordinates.splitOnToken(","));
              // ...and parse them before passing onwards
              def tmpOfficeDistance = geoDistanceInMiles(Float.parseFloat(latLong[0]),
                                                         Float.parseFloat(latLong[1]),
                                                         params.lat,
                                                         params.lon);
              // we're interested in the nearest office...
              if (tmpOfficeDistance < officeDistance) {
                officeDistance = tmpOfficeDistance;
              }
            }
            
            if (hqDistance < officeDistance) {
              return hqDistance;
            }
            
            return officeDistance;
          """
        },
        "type": "Number"
      }
    }
  ]
}

无耻插件:我在 dedicated chapter of my ES Handbook.

中深入研究了 Elasticsearch 脚本