通过字符串数组进行 Elasticsearch 聚合

Question

我有一个 ElasticSearch 索引，我在其中存储电话交易（SMS、MMS、呼叫等）及其相关费用。

这些文件的关键是MSISDN (MSISDN = phone number)。在我的应用程序中，我知道有一群用户。每个用户可以拥有一个或多个MSISDN。

这种文件的映射如下：

"mappings" : {
      "cdr" : {
        "properties" : {
          "callDatetime" : {
            "type" : "long"
          },
          "callSource" : {
            "type" : "string"
          },
          "callType" : {
            "type" : "string"
          },
          "callZone" : {
            "type" : "string"
          },
          "calledNumber" : {
            "type" : "string"
          },
          "companyKey" : {
            "type" : "string"
          },
          "consumption" : {
            "properties" : {
              "data" : {
                "type" : "long"
              },
              "voice" : {
                "type" : "long"
              }
            }
          },
          "cost" : {
            "type" : "double"
          },
          "country" : {
            "type" : "string"
          },
          "included" : {
            "type" : "boolean"
          },
          "msisdn" : {
            "type" : "string"
          },
          "network" : {
            "type" : "string"
          }
        }
      }
    }

我的目标和问题：

我的目标是进行查询，通过 callType 通过 group 检索 cost。但是组在 ElasticSearch 中没有表示，只在我的 PostgreSQL 数据库中表示。

所以我将创建一个方法来检索每个现有组的所有 MSISDN，并获得类似于字符串数组列表的内容，其中包含每个组中的每个 MSISDN。

假设我有类似的东西：

"msisdn_by_group" : [
    {
       "group1" : ["01111111111", "02222222222", "033333333333", "044444444444"]
    },
    {
       "group2" : ["05555555555","06666666666"]
    }
]

现在，我将使用它来生成 Elasticsearch 查询。我想对不同存储桶中的所有这些术语进行聚合，成本总和，然后按 callType 再次拆分。（制作堆积条形图）。

我已经尝试了几件事，但没能成功（直方图、桶、项和总和是我正在使用的主要关键字）。

如果这里有人可以帮助我处理订单，以及我可以用来实现此目的的关键字，那就太好了:)谢谢

编辑：这是我最后一次尝试： 查询：

{
    "aggs" : {
        "cost_histogram": {
            "terms": {
                "field": "callType"
            },
            "aggs": {
                "cost_histogram_sum" : {
                    "sum": {
                        "field": "cost"
                    }
                }
            }
        }
    }
}

我得到了预期的结果，但它缺少 "group" 拆分，因为我不知道如何将 MSISDN 数组作为标准传递：

结果：

"aggregations": {
    "cost_histogram": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "data",
          "doc_count": 5925,
          "cost_histogram_sum": {
            "value": 0
          }
        },
        {
          "key": "sms_mms",
          "doc_count": 5804,
          "cost_histogram_sum": {
            "value": 91.76999999999995
          }
        },
        {
          "key": "voice",
          "doc_count": 5299,
          "cost_histogram_sum": {
            "value": 194.1196
          }
        },
        {
          "key": "sms_mms_plus",
          "doc_count": 35,
          "cost_histogram_sum": {
            "value": 7.2976
          }
        }
      ]
    }
  }

Answer 1

好的，我找到了如何通过一个查询来完成此操作，但这该死的查询很长，因为它对每个组都重复，但我别无选择。我正在使用 "filter" 聚合器。

这是一个基于我在上面问题中写的数组的工作示例：

POST localhost:9200/cdr/_search?size=0

{
    "query": {
        "term" : {
            "companyKey" : 1
        }   
    },
    "aggs" : {
        "group_1_split_cost": {
            "filter": {
                "bool": {
                    "should": [{
                        "bool": {
                            "must": {
                                "match": {
                                    "msisdn": "01111111111"
                                }
                            }
                        }
                    },{
                        "bool": {
                            "must": {
                                "match": {
                                    "msisdn": "02222222222"
                                }
                            }
                        }
                    },{
                        "bool": {
                            "must": {
                                "match": {
                                    "msisdn": "03333333333"
                                }
                            }
                        }
                    },{
                        "bool": {
                            "must": {
                                "match": {
                                    "msisdn": "04444444444"
                                }
                            }
                        }
                    }]
                }
            },
            "aggs": {
                "cost_histogram": {
                    "terms": {
                        "field": "callType"
                    },
                    "aggs": {
                        "cost_histogram_sum" : {
                            "sum": {
                                "field": "cost"
                            }
                        }
                    }
                }
            }
        },
        "group_2_split_cost": {
            "filter": {
                "bool": {
                    "should": [{
                        "bool": {
                            "must": {
                                "match": {
                                    "msisdn": "05555555555"
                                }
                            }
                        }
                    },{
                        "bool": {
                            "must": {
                                "match": {
                                    "msisdn": "06666666666"
                                }
                            }
                        }
                    }]
                }
            },
            "aggs": {
                "cost_histogram": {
                    "terms": {
                        "field": "callType"
                    },
                    "aggs": {
                        "cost_histogram_sum" : {
                            "sum": {
                                "field": "cost"
                            }
                        }
                    }
                }
            }
        }
    }
}

感谢更新版本的 Elasticsearch，我们现在可以嵌套非常深的聚合，但是我们不能将值数组传递给 "OR" 运算符或类似的东西，这仍然有点太糟糕了。我猜它可以减少这些查询的大小。即使它们有点特殊并且用于小众情况，例如我。

通过字符串数组进行 Elasticsearch 聚合

Elasticsearch aggregation by arrays of String

aggregation

elasticsearch