按术语分组并获取嵌套数组 属性 的计数?

Group by terms and get count of nested array property?

我想从文档系列中获取计数,其中数组项与某个值匹配。

我有这样的文件:

{
    "Name": "jason",
    "Todos": [{
        "State": "COMPLETED"
        "Timer": 10
        },{
        "State": "PENDING"
        "Timer": 5
    }]
}

{
    "Name": "jason",
    "Todos": [{
        "State": "COMPLETED"
        "Timer": 5
        },{
        "State": "PENDING"
        "Timer": 2
    }]
}

{
    "Name": "martin",
    "Todos": [{
        "State": "COMPLETED"
        "Timer": 15
        },{
        "State": "PENDING"
        "Timer": 10
    }]
}

我想数一数我有多少文档,其中有任何带有已完成状态的待办事项。并按名称分组。

所以从上面我需要得到: 杰森:2 马丁:1

通常我会使用名称的术语聚合和其他项目的另一个子聚合来执行此操作:

"aggs": {
    "statistics": {
        "terms": {
            "field": "Name"
        },
        "aggs": {
            "test": {
                "filter": {
                    "bool": {
                        "must": [{
                                "match_phrase": {
                                    "SomeProperty.keyword": {
                                        "query": "THEVALUE"
                                    }
                                }
                            }
                        ]
                    }
                },

但不确定如何在这里执行此操作,因为我在数组中有项目。

Elasticsearch 对数组没有问题,因为实际上它 flattens them by default:

Arrays of inner object fields do not work the way you may expect. Lucene has no concept of inner objects, so Elasticsearch flattens object hierarchies into a simple list of field names and values.

所以像您发布的那样的查询就可以了。不过,我会使用 term query for keyword datatype

POST mytodos/_search
{
  "size": 0,
  "aggs": {
    "by name": {
      "terms": {
        "field": "Name"
      },
      "aggs": {
        "how many completed": {
          "filter": {
            "term": {
              "Todos.State": "COMPLETED"
            }
          }
        }
      }
    }
  }
}

我假设您的映射看起来像这样:

PUT mytodos/_mappings
{
  "properties": {
    "Name": {
      "type": "keyword"
    },
    "Todos": {
      "properties": {
        "State": {
          "type": "keyword"
        },
        "Timer": {
          "type": "integer"
        }
      }

    }
  }
}

您发布的示例文档将在内部转换为如下内容:

{
  "Name": "jason",
  "Todos.State": ["COMPLETED", "PENDING"],
  "Todos.Timer": [10, 5]
}

但是,如果您需要查询 Todos.State Todos.Timer,例如,过滤那些 "COMPLETED" 但仅使用 Timer > 10,这样的映射是不可能的,因为 Elasticsearch 忘记了对象数组项字段之间的 link。

在这种情况下,您需要使用 nested datatype for such arrays, and query them with special nested query.

希望对您有所帮助!