ElasticSearch:在具有复杂对象的数组中查找多个唯一值

ElasticSearch: find multiple unique values in array with complex objects

假设有一个索引,其中的文档遵循如下结构:

{
    "array": [
        {
            "field1": 1,
            "field2": 2
        },
        {
            "field1": 3,
            "field2": 2
        },
        {
            "field1": 3,
            "field2": 2
        },
        ...
    ]
}

是否可以定义一个查询 returns 文档的一个字段具有多个唯一值?

对于上面的示例,在 field2 上搜索的查询不会 return 文档,因为它们都具有相同的值, 但搜索 field1 会 return 因为它有值 1 和 3.

我唯一能想到的是将唯一值存储在父对象中,然后查询它的长度,但是,因为这看起来微不足道,我希望不必将结构更改为类似于:

{
    "arrayField1Values" : [1, 3],
    "arrayField2Values" : [2]
    "array": [
        {
            "field1": 1,
            "field2": 2
        },
        {
            "field1": 3,
            "field2": 2
        },
        {
            "field1": 3,
            "field2": 2
        },
        ...
    ]
}

感谢任何可以提供帮助的人!

我的直觉是使用 nested 数据类型,但后来我意识到您可以使用 query scripts and top_hits:[=18= 对字段 1 和 2 的数组值进行简单的不同计数]

PUT array

POST array/_doc
{
  "array": [
    {
      "field1": 1,
      "field2": 2
    },
    {
      "field1": 3,
      "field2": 2
    },
    {
      "field1": 3,
      "field2": 2
    }
  ]
}

GET array/_search
{
  "size": 0,
  "aggs": {
    "field1_is_unique": {
      "filter": {
        "script": {
          "script": {
            "source": "def uniques = doc['array.field1'].stream().distinct().sorted().collect(Collectors.toList()); return uniques.length > 1 ;",
            "lang": "painless"
          }
        }
      },
      "aggs": {
        "top_hits_field1": {
          "top_hits": {}
        }
      }
    },
    "field2_is_unique": {
      "filter": {
        "script": {
          "script": {
            "source": "def uniques = doc['array.field2'].stream().distinct().sorted().collect(Collectors.toList()); return uniques.length > 1 ;",
            "lang": "painless"
          }
        }
      },
      "aggs": {
        "top_hits_field2": {
          "top_hits": {}
        }
      }
    }
  }
}

针对 field1field2 是否包含大于 1 的唯一值计数生成单独的聚合:

 "aggregations" : {
    "field1_is_unique" : {
      "doc_count" : 1,
      "top_hits_field1" : {
        "hits" : {
          "total" : {
            "value" : 1,
            "relation" : "eq"
          },
          "max_score" : 1.0,
          "hits" : [
            {
              "_index" : "array",
              "_type" : "_doc",
              "_id" : "WbJhgnEBVBaNYdXKNktL",
              "_score" : 1.0,
              "_source" : {
                "array" : [
                  {
                    "field1" : 1,
                    "field2" : 2
                  },
                  {
                    "field1" : 3,
                    "field2" : 2
                  },
                  {
                    "field1" : 3,
                    "field2" : 2
                  }
                ]
              }
            }
          ]
        }
      }
    },
    "field2_is_unique" : {
      "doc_count" : 0,
      "top_hits_field2" : {
        "hits" : {
          "total" : {
            "value" : 0,
            "relation" : "eq"
          },
          "max_score" : null,
          "hits" : [ ]
        }
      }
    }
  }

希望对您有所帮助。