如何在 elasticsearch 中按深层对象内的元素排序?

How to sort by an element inside a deep object in elasticsearch?

我有这个映射:

{
    "foos": {
        "mappings": {
            "foo": {
                "dynamic": "false",
                "properties": {
                    "some_id": {
                        "type": "integer"
                    },
                    "language": {
                        "type": "text"
                    },
                    "locations": {
                        "type": "integer"
                    },
                    "name": {
                        "type": "text",
                        "term_vector": "yes",
                        "analyzer": "name_analyzer"
                    },
                    "popularity": {
                        "type": "integer"
                    },
                    "some_deep_count": {
                        "type": "object"
                    }
                }
            }
        }
    }
}

一个示例条目如下:

                 {
                    "name": "Some nice name",
                    "some_id": 1,
                    "id": 4378,
                    "popularity": 525,
                    "some_deep_count": {
                        "0": {
                            "32026": 344,
                            "55625": 458,
                            "29": 1077,
                            "55531": 1081,
                            ...
                        },
                        "1": {
                            "32026": 57,
                            "55625": 60,
                            "29": 88,
                            ...
                        }
                    },
                    "locations": [
                        32026,
                        55625,
                        ...
                    ],
                    "language": [
                        "es",
                        "en"
                    ]
                }

其中 some_deep_count 字段只能包含“0”和“1”键,其中可以包含很长的 id => value 列表(动态,不可提前配置)

这在过滤时非常有效:

"_source": [
        "id",
        "some_deep_count.*.55529"
    ],

但我不明白如何按任何深层对象进行排序。我需要一个深度求和,如下表达式所示:

...
{
    "sort": {
        "_script": {
            "type": "number",
            "script": {
                "lang": "painless",
                "source": "def deep0 = 0; def deep1 = 0; if(doc.containsKey('some_deep_count.0.55529')) { deep0 = doc['some_deep_count.0.55529'] } if(doc.containsKey('some_deep_count.1.55529')) { deep1 = doc['some_deep_count.1.55529'] } return deep0 + deep1"
            },
            "order": "desc"
        }
    }
}

不幸的是,在排序字段中总是 returns 0,因为 doc.containsKey('some_deep_count.0.55529') 结果总是空的。 doc.containsKey('some_deep_count') 也是。

有趣的是,doc.containsKey('some_id') 会起作用,我真的不明白为什么

编辑

为了响应 Val 的建议,我附上了完整的请求/响应

要求:

{
  "sort": {
  "_script": {
    "type": "number",
    "script": {
      "lang": "painless",
      "source": "def ps0 = 0; if(doc.containsKey('some_deep_count.0.55529')) { ps0 = doc['some_deep_count.0.55529'].value; }  return ps0 "
    },
    "order": "desc"
  }
},
  "_source": [
      "id",
      "some_deep_count.0.55529"
  ],
  "size": 1
}

回复:

{
    "took": 4,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 2121,
        "max_score": null,
        "hits": [
            {
                "_index": "foos",
                "_type": "foo",
                "_id": "5890",
                "_score": null,
                "_source": {
                    "some_deep_count": {
                        "0": {
                            "55529": 228
                        }
                    },
                    "id": 5890
                },
                "sort": [
                    0.0
                ]
            }
        ]
    }
}

问题可能是在条件中发现的。事实上,即使排序像 "def ps0 = 0; if(doc.containsKey('some_deep_count')) { ps0 = 99999; } return ps0 " 这样简单,我也会得到 "sort":[0.0],这表明子句 doc.containsKey('some_deep_count') 可能有一些问题

编辑2

curl -XGET localhost:9200/foos 得到的索引如下所示:

{
  "foos": {
      "aliases": {},
      "mappings": {
          "foo": {
              "dynamic": "false",
              "properties": {
                  "some_id": {
                      "type": "integer"
                  },
                  "language": {
                      "type": "text"
                  },
                  "locations": {
                      "type": "integer"
                  },
                  "name": {
                      "type": "text",
                      "term_vector": "yes",
                      "analyzer": "name_analyzer"
                  },
                  "popularity": {
                      "type": "integer"
                  },
                  "some_deep_count": {
                      "type": "object"
                  }
              }
          }
      },
      "settings": {
          "index": {
              "number_of_shards": "5",
              "provided_name": "foos",
              "creation_date": "1576168104248",
              "analysis": {
                  "analyzer": {
                      "name_analyzer": {
                          "filter": [
                              "lowercase"
                          ],
                          "tokenizer": "keyword"
                      }
                  }
              },
              "number_of_replicas": "0",
              "uuid": "26xckWaOQuuxFrMvIdikvw",
              "version": {
                  "created": "6020199"
              }
          }
      }
  }
}

谢谢

我已经能够 return 通过重现您的案例来获得非零排序值,如下所示:

# 1. create the index mapping
PUT sorts
{
  "mappings": {
    "properties": {
      "some_deep_count": {
        "type": "object"
      }
    }
  }
}

# 2. index a sample document
PUT sorts/_doc/1
{
  "some_deep_count": {
    "0": {
      "29": 1077,
      "32026": 344,
      "55531": 1081,
      "55625": 458
    },
    "1": {
      "29": 88,
      "32026": 57,
      "55625": 60
    }
  }
}

# 3. Sort the results
POST sorts/_search
{
  "sort": {
    "_script": {
      "type": "number",
      "script": {
        "lang": "painless",
        "source": """
          def deep0 = 0; 
          def deep1 = 0; 
          if(doc.containsKey('some_deep_count.0.55531')) { 
            deep0 = doc['some_deep_count.0.55531'].value;
          } 
          if(doc.containsKey('some_deep_count.1.55531')) { 
            deep1 = doc['some_deep_count.1.55531'].value;
          } 
          return deep0 + deep1;
        """
      },
      "order": "desc"
    }
  }
}

结果 => 排序 = 1081

"hits" : [
  {
    "_index" : "sorts",
    "_type" : "_doc",
    "_id" : "1",
    "_score" : null,
    "_source" : {
      "some_deep_count" : {
        "0" : {
          "29" : 1077,
          "32026" : 344,
          "55531" : 1081,
          "55625" : 458
        },
        "1" : {
          "29" : 88,
          "32026" : 57,
          "55625" : 60
        }
      }
    },
    "sort" : [
      1081.0
    ]
  }
]

如您所见,我使用的 55531 存在于 some_deep_count.0 中,但不存在于 some_deep_count.1 中,结果是 1081,这是正确的。

使用在 0 和 1 中都可用的 *.29 会产生 1165,这也是正确的 (1077 + 88)。

我的脚本和你的唯一区别是,在分配 deep0deep1 时,你需要将 .value 添加到文档字段引用中。

更新

问题在于您在映射中指定了 dynamic: false。使用该参数意味着如果您索引创建索引时映射中不存在的新字段,您的映射将不会更新。因此,就目前而言,您在 some_deep_count 中索引的所有子字段都不会被索引,这就是为什么您总是得到 0 的原因。删除 dynamic: false 一切都会按预期工作。