Elasticsearch 聚合没有提供理想的输出

Question

我有一个对象，其中包含给患者服用的大量药物。可对患者施用多于一种药物。我正在尝试汇总在指定时间内给患者服用的药物总数。

这是我的对象的示例。

{
    "_uid" : "953a4af9901847c3b206dac7cee5b298",
    "_fullName" : "Test Patient",
    "_created": "2021-12-18 22:48:45",
    "_treatment" : {
        "_created" : "2021-12-18 22:48:45",
        "_drugs" : [
            {
                "_name" : "Another Tablet",
                "_uid" : "5a09f6a9c415465a84a8661f35ac621d",
                "_mils" : "500"
              },
              {
                "_name" : "Test Drug",
                "_uid" : "36c7fcf048c743078ca4c80d187d86c9",
                "_mils" : "300"
           }
        ]
    }
}

在 Kibana 中，我做了以下操作

{
  "query": {
    "bool": {
      "filter": {
         "range": {
             "_created": {
                 "gte": "2021-01-01 00:00:00",
                 "lte": "2021-12-31 00:00:00"
             }
         }
      }
    }
  },
  "size": 0,
  "aggs" : {
      "men" : {
        "terms": {
          "field": "_treatment._drugs._name.keyword"
        },
        "aggs": {
          "milsUsed": { "sum": { "field": "_treatment._drugs._mils" } }
        }
      }
    }
}

目前 kibana 正在将所有磨机加在一起，而不是将它们分开。以下是 Kibana 的回复。

"aggregations" : {
    "men" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Another Tablet",
          "doc_count" : 2,
          "milsUsed" : {
            "value" : 1100.0
          }
        },
        {
          "key" : "Test Drug",
          "doc_count" : 2,
          "milsUsed" : {
            "value" : 1100.0
          }
        }
      ]
    }
  }

我希望得到的预期响应

"aggregations" : {
    "men" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Another Tablet",
          "doc_count" : 1,
          "milsUsed" : {
            "value" : 500.0
          }
        },
        {
          "key" : "Test Drug",
          "doc_count" : 1,
          "milsUsed" : {
            "value" : 300.0
          }
        }
      ]
    }
  }

索引映射

{
    "patients" : {
        "mappings" : {
            "properties" : {
                "_fullName" : {
                    "type" : "text",
                    "fields" : {
                        "keyword" : {
                            "type" : "keyword",
                            "ignore_above" : 256
                        }
                    }
                },
                "_treatment" : {
                    "properties": {
                        "_drugs": {
                            "properties": {
                                "_mils" : {
                                    "type" : "long"
                                },
                                "_name" : {
                                    "type" : "text",
                                    "fields" : {
                                        "keyword" : {
                                            "type" : "keyword",
                                             "ignore_above" : 256
                                        }
                                    }
                                },,
                                "_uid" : {
                                    "type" : "text",
                                    "fields" : {
                                        "keyword" : {
                                            "type" : "keyword",
                                             "ignore_above" : 256
                                        }
                                    }
                                },
                            }
                        }
                    }
                }
            }
        }
    }
}

Answer 1

TLDR;

你听说过弹性搜索中的 nested fields 吗？内部 Elastic 搜索将文档中的嵌套对象展平。

所以如果你有

{
  "group" : "fans",
  "user" : [ 
    {
      "first" : "John",
      "last" :  "Smith"
    },
    {
      "first" : "Alice",
      "last" :  "White"
    }
  ]
}

索引中 json 个文档的内部表示将是

{
  "group" :        "fans",
  "user.first" : [ "alice", "john" ],
  "user.last" :  [ "smith", "white" ]
}

在你的情况下，当你执行聚合时，它会做同样的事情。突然之间，因为 flattening 操作。你失去了 _drugs._name 和 _drugs._mils

之间的“关系”

以下是解决您使用示例的宠物项目。

例子

设置

PUT /so_agg_sum_drugs/
{
  "mappings": {
    "properties": {
      "_fullName": {
        "type": "keyword"
      },
      "_treatment": {
        "properties": {
          "_drugs": {
            "type": "nested",   <- nested field type !!
            "properties": {
              "_mils": {
                "type": "long"
              },
              "_name": {
                "type": "keyword"
              },
              "_uid": {
                "type": "keyword"
              }
            }
          }
        }
      }
    }
  }
}

POST /so_agg_sum_drugs/_doc
{
  "_fullName" : "Test Patient",
  "_treatment" : {
    "_drugs" : [
      {
          "_name" : "Another Tablet",
          "_uid" : "5a09f6a9c415465a84a8661f35ac621d",
          "_mils" : "500"
        },
        {
          "_name" : "Test Drug",
          "_uid" : "36c7fcf048c743078ca4c80d187d86c9",
          "_mils" : "300"
      }
    ]
  }
}

POST /so_agg_sum_drugs/_doc
{
  "_fullName" : "Test Patient 2",
  "_treatment" : {
    "_drugs" : [
      {
        "_name" : "Another Tablet",
        "_uid" : "5a09f6a9c415465a84a8661f35ac621d",
        "_mils" : "500"
      },
      {
        "_name" : "Test Drug",
        "_uid" : "36c7fcf048c743078ca4c80d187d86c9",
        "_mils" : "400"
      },
      {
        "_name" : "Test Drug",
        "_uid" : "36c7fcf048c743078ca4c80d187d86c9",
        "_mils" : "300"
      }
    ]
  }
}

解决方案

除了嵌套字段类型外，您的聚合基本正确。您可以在此处找到有关嵌套字段聚合的一些文档。 [doc]

GET /so_agg_sum_drugs/_search
{
  "size": 0,
  "query": {
    "match_all": {}
  },
  "aggs": {
    "patients": {
      "terms": {
        "field": "_fullName"
      },
      "aggs": {
        "drugs": {
          "nested": {
            "path": "_treatment._drugs". <- wrap you agg on the drugs objects in a nested type agg.
          },
          "aggs": {
            "per_drug": {
              "terms": {
                "field": "_treatment._drugs._name"
              },
              "aggs": {
                "quantity": {
                  "sum": {
                    "field": "_treatment._drugs._mils"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

{
  "took" : 350,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "patients" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Test Patient",
          "doc_count" : 1,
          "drugs" : {
            "doc_count" : 2,
            "per_drug" : {
              "doc_count_error_upper_bound" : 0,
              "sum_other_doc_count" : 0,
              "buckets" : [
                {
                  "key" : "Another Tablet",
                  "doc_count" : 1,
                  "quantity" : {
                    "value" : 500.0
                  }
                },
                {
                  "key" : "Test Drug",
                  "doc_count" : 1,
                  "quantity" : {
                    "value" : 300.0
                  }
                }
              ]
            }
          }
        },
        {
          "key" : "Test Patient 2",
          "doc_count" : 1,
          "drugs" : {
            "doc_count" : 3,
            "per_drug" : {
              "doc_count_error_upper_bound" : 0,
              "sum_other_doc_count" : 0,
              "buckets" : [
                {
                  "key" : "Test Drug",
                  "doc_count" : 2,
                  "quantity" : {
                    "value" : 700.0
                  }
                },
                {
                  "key" : "Another Tablet",
                  "doc_count" : 1,
                  "quantity" : {
                    "value" : 500.0
                  }
                }
              ]
            }
          }
        }
      ]
    }
  }
}

Elasticsearch 聚合没有提供理想的输出

Elasticsearch Aggregation not giving desirable otput

elasticsearch

kibana

TLDR;

例子