Elasticsearch 聚合没有提供理想的输出
Elasticsearch Aggregation not giving desirable otput
我有一个对象,其中包含给患者服用的大量药物。
可对患者施用多于一种药物。
我正在尝试汇总在指定时间内给患者服用的药物总数。
这是我的对象的示例。
{
"_uid" : "953a4af9901847c3b206dac7cee5b298",
"_fullName" : "Test Patient",
"_created": "2021-12-18 22:48:45",
"_treatment" : {
"_created" : "2021-12-18 22:48:45",
"_drugs" : [
{
"_name" : "Another Tablet",
"_uid" : "5a09f6a9c415465a84a8661f35ac621d",
"_mils" : "500"
},
{
"_name" : "Test Drug",
"_uid" : "36c7fcf048c743078ca4c80d187d86c9",
"_mils" : "300"
}
]
}
}
在 Kibana 中,我做了以下操作
{
"query": {
"bool": {
"filter": {
"range": {
"_created": {
"gte": "2021-01-01 00:00:00",
"lte": "2021-12-31 00:00:00"
}
}
}
}
},
"size": 0,
"aggs" : {
"men" : {
"terms": {
"field": "_treatment._drugs._name.keyword"
},
"aggs": {
"milsUsed": { "sum": { "field": "_treatment._drugs._mils" } }
}
}
}
}
目前 kibana 正在将所有磨机加在一起,而不是将它们分开。以下是 Kibana 的回复。
"aggregations" : {
"men" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Another Tablet",
"doc_count" : 2,
"milsUsed" : {
"value" : 1100.0
}
},
{
"key" : "Test Drug",
"doc_count" : 2,
"milsUsed" : {
"value" : 1100.0
}
}
]
}
}
我希望得到的预期响应
"aggregations" : {
"men" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Another Tablet",
"doc_count" : 1,
"milsUsed" : {
"value" : 500.0
}
},
{
"key" : "Test Drug",
"doc_count" : 1,
"milsUsed" : {
"value" : 300.0
}
}
]
}
}
索引映射
{
"patients" : {
"mappings" : {
"properties" : {
"_fullName" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"_treatment" : {
"properties": {
"_drugs": {
"properties": {
"_mils" : {
"type" : "long"
},
"_name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},,
"_uid" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
}
}
}
}
}
}
}
}
TLDR;
你听说过弹性搜索中的 nested fields 吗?
内部 Elastic 搜索将文档中的嵌套对象展平。
所以如果你有
{
"group" : "fans",
"user" : [
{
"first" : "John",
"last" : "Smith"
},
{
"first" : "Alice",
"last" : "White"
}
]
}
索引中 json 个文档的内部表示将是
{
"group" : "fans",
"user.first" : [ "alice", "john" ],
"user.last" : [ "smith", "white" ]
}
在你的情况下,当你执行聚合时,它会做同样的事情。突然之间,因为 flattening
操作。你失去了 _drugs._name
和 _drugs._mils
之间的“关系”
以下是解决您使用示例的宠物项目。
例子
- 设置
PUT /so_agg_sum_drugs/
{
"mappings": {
"properties": {
"_fullName": {
"type": "keyword"
},
"_treatment": {
"properties": {
"_drugs": {
"type": "nested", <- nested field type !!
"properties": {
"_mils": {
"type": "long"
},
"_name": {
"type": "keyword"
},
"_uid": {
"type": "keyword"
}
}
}
}
}
}
}
}
POST /so_agg_sum_drugs/_doc
{
"_fullName" : "Test Patient",
"_treatment" : {
"_drugs" : [
{
"_name" : "Another Tablet",
"_uid" : "5a09f6a9c415465a84a8661f35ac621d",
"_mils" : "500"
},
{
"_name" : "Test Drug",
"_uid" : "36c7fcf048c743078ca4c80d187d86c9",
"_mils" : "300"
}
]
}
}
POST /so_agg_sum_drugs/_doc
{
"_fullName" : "Test Patient 2",
"_treatment" : {
"_drugs" : [
{
"_name" : "Another Tablet",
"_uid" : "5a09f6a9c415465a84a8661f35ac621d",
"_mils" : "500"
},
{
"_name" : "Test Drug",
"_uid" : "36c7fcf048c743078ca4c80d187d86c9",
"_mils" : "400"
},
{
"_name" : "Test Drug",
"_uid" : "36c7fcf048c743078ca4c80d187d86c9",
"_mils" : "300"
}
]
}
}
- 解决方案
除了嵌套字段类型外,您的聚合基本正确。您可以在此处找到有关嵌套字段聚合的一些文档。 [doc]
GET /so_agg_sum_drugs/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"patients": {
"terms": {
"field": "_fullName"
},
"aggs": {
"drugs": {
"nested": {
"path": "_treatment._drugs". <- wrap you agg on the drugs objects in a nested type agg.
},
"aggs": {
"per_drug": {
"terms": {
"field": "_treatment._drugs._name"
},
"aggs": {
"quantity": {
"sum": {
"field": "_treatment._drugs._mils"
}
}
}
}
}
}
}
}
}
}
{
"took" : 350,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"patients" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Test Patient",
"doc_count" : 1,
"drugs" : {
"doc_count" : 2,
"per_drug" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Another Tablet",
"doc_count" : 1,
"quantity" : {
"value" : 500.0
}
},
{
"key" : "Test Drug",
"doc_count" : 1,
"quantity" : {
"value" : 300.0
}
}
]
}
}
},
{
"key" : "Test Patient 2",
"doc_count" : 1,
"drugs" : {
"doc_count" : 3,
"per_drug" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Test Drug",
"doc_count" : 2,
"quantity" : {
"value" : 700.0
}
},
{
"key" : "Another Tablet",
"doc_count" : 1,
"quantity" : {
"value" : 500.0
}
}
]
}
}
}
]
}
}
}
我有一个对象,其中包含给患者服用的大量药物。 可对患者施用多于一种药物。 我正在尝试汇总在指定时间内给患者服用的药物总数。
这是我的对象的示例。
{
"_uid" : "953a4af9901847c3b206dac7cee5b298",
"_fullName" : "Test Patient",
"_created": "2021-12-18 22:48:45",
"_treatment" : {
"_created" : "2021-12-18 22:48:45",
"_drugs" : [
{
"_name" : "Another Tablet",
"_uid" : "5a09f6a9c415465a84a8661f35ac621d",
"_mils" : "500"
},
{
"_name" : "Test Drug",
"_uid" : "36c7fcf048c743078ca4c80d187d86c9",
"_mils" : "300"
}
]
}
}
在 Kibana 中,我做了以下操作
{
"query": {
"bool": {
"filter": {
"range": {
"_created": {
"gte": "2021-01-01 00:00:00",
"lte": "2021-12-31 00:00:00"
}
}
}
}
},
"size": 0,
"aggs" : {
"men" : {
"terms": {
"field": "_treatment._drugs._name.keyword"
},
"aggs": {
"milsUsed": { "sum": { "field": "_treatment._drugs._mils" } }
}
}
}
}
目前 kibana 正在将所有磨机加在一起,而不是将它们分开。以下是 Kibana 的回复。
"aggregations" : {
"men" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Another Tablet",
"doc_count" : 2,
"milsUsed" : {
"value" : 1100.0
}
},
{
"key" : "Test Drug",
"doc_count" : 2,
"milsUsed" : {
"value" : 1100.0
}
}
]
}
}
我希望得到的预期响应
"aggregations" : {
"men" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Another Tablet",
"doc_count" : 1,
"milsUsed" : {
"value" : 500.0
}
},
{
"key" : "Test Drug",
"doc_count" : 1,
"milsUsed" : {
"value" : 300.0
}
}
]
}
}
索引映射
{
"patients" : {
"mappings" : {
"properties" : {
"_fullName" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"_treatment" : {
"properties": {
"_drugs": {
"properties": {
"_mils" : {
"type" : "long"
},
"_name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},,
"_uid" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
}
}
}
}
}
}
}
}
TLDR;
你听说过弹性搜索中的 nested fields 吗? 内部 Elastic 搜索将文档中的嵌套对象展平。
所以如果你有
{
"group" : "fans",
"user" : [
{
"first" : "John",
"last" : "Smith"
},
{
"first" : "Alice",
"last" : "White"
}
]
}
索引中 json 个文档的内部表示将是
{
"group" : "fans",
"user.first" : [ "alice", "john" ],
"user.last" : [ "smith", "white" ]
}
在你的情况下,当你执行聚合时,它会做同样的事情。突然之间,因为 flattening
操作。你失去了 _drugs._name
和 _drugs._mils
以下是解决您使用示例的宠物项目。
例子
- 设置
PUT /so_agg_sum_drugs/
{
"mappings": {
"properties": {
"_fullName": {
"type": "keyword"
},
"_treatment": {
"properties": {
"_drugs": {
"type": "nested", <- nested field type !!
"properties": {
"_mils": {
"type": "long"
},
"_name": {
"type": "keyword"
},
"_uid": {
"type": "keyword"
}
}
}
}
}
}
}
}
POST /so_agg_sum_drugs/_doc
{
"_fullName" : "Test Patient",
"_treatment" : {
"_drugs" : [
{
"_name" : "Another Tablet",
"_uid" : "5a09f6a9c415465a84a8661f35ac621d",
"_mils" : "500"
},
{
"_name" : "Test Drug",
"_uid" : "36c7fcf048c743078ca4c80d187d86c9",
"_mils" : "300"
}
]
}
}
POST /so_agg_sum_drugs/_doc
{
"_fullName" : "Test Patient 2",
"_treatment" : {
"_drugs" : [
{
"_name" : "Another Tablet",
"_uid" : "5a09f6a9c415465a84a8661f35ac621d",
"_mils" : "500"
},
{
"_name" : "Test Drug",
"_uid" : "36c7fcf048c743078ca4c80d187d86c9",
"_mils" : "400"
},
{
"_name" : "Test Drug",
"_uid" : "36c7fcf048c743078ca4c80d187d86c9",
"_mils" : "300"
}
]
}
}
- 解决方案
除了嵌套字段类型外,您的聚合基本正确。您可以在此处找到有关嵌套字段聚合的一些文档。 [doc]
GET /so_agg_sum_drugs/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"patients": {
"terms": {
"field": "_fullName"
},
"aggs": {
"drugs": {
"nested": {
"path": "_treatment._drugs". <- wrap you agg on the drugs objects in a nested type agg.
},
"aggs": {
"per_drug": {
"terms": {
"field": "_treatment._drugs._name"
},
"aggs": {
"quantity": {
"sum": {
"field": "_treatment._drugs._mils"
}
}
}
}
}
}
}
}
}
}
{
"took" : 350,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"patients" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Test Patient",
"doc_count" : 1,
"drugs" : {
"doc_count" : 2,
"per_drug" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Another Tablet",
"doc_count" : 1,
"quantity" : {
"value" : 500.0
}
},
{
"key" : "Test Drug",
"doc_count" : 1,
"quantity" : {
"value" : 300.0
}
}
]
}
}
},
{
"key" : "Test Patient 2",
"doc_count" : 1,
"drugs" : {
"doc_count" : 3,
"per_drug" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Test Drug",
"doc_count" : 2,
"quantity" : {
"value" : 700.0
}
},
{
"key" : "Another Tablet",
"doc_count" : 1,
"quantity" : {
"value" : 500.0
}
}
]
}
}
}
]
}
}
}