在 Kibana 的 Vega 中，如何在一个请求中从两个不同的聚合创建层

Question

在 Elasticsearch 的 HTTP API 中，您可以在对 _search API 的单个请求中进行分桶聚合和度量聚合。在 Kibana 的 Vega 环境中，如何创建 Vega 可视化，它使用单个 _search 请求和桶聚合和指标聚合；然后制作一个图表，其中一层使用存储桶中的数据，一层使用指标中的数据？

为了使这个问题更具体，考虑这个例子：

假设我们是帽子制造商。多家商店出售我们的帽子。我们有一个 Elasticsearch 索引 hat-sales，每次我们的一顶帽子售出时都有一个文档。本文档中包含出售帽子的商店。

以下是该索引中文档的两个示例：

{
  "type": "top",
  "color": "black",
  "price": 19,
  "store": "Macy's"
}
{
  "type": "fez",
  "color": "red",
  "price": 94,
  "store": "Walmart"
}

我想创建一个条形图来显示前 3 家商店的帽子销量。我也想要此图表上的水平规则显示所有商店销售的帽子的平均数量 - 而不仅仅是前 3 名。这是我希望图表看起来像的草图：

如果我们这样做，让 Vega 计算平均值：

{
  "$schema": "https://vega.github.io/schema/vega-lite/v2.json",
  "title": "Hat Sales",
  "data": {
    "url": {
      "index": "hat-sales",
      "body": {
        "size": 0,
        "query": {"match_all": {}},
        "aggs": {"stores": {"terms": {"field": "store.keyword", "size": 3}}}
      }
    },
    "format": {"property": "aggregations.stores.buckets"}
  },
  "transform": [
    {"calculate": "datum.key", "as": "store"},
    {"calculate": "datum.doc_count", "as": "count"}
  ],
  "layer": [
    {
      "name": "Sales of top 3 stores",
      "mark": "bar",
      "encoding": {
        "x": {"type": "nominal", "field": "store", "sort": "-y"},
        "y": {"type": "quantitative", "field": "count"}
      }
    },
    {
      "name": "Average number of sales over all stores",
      "mark": {"type": "rule", "color": "red"},
      "encoding": {"y": {"aggregate": "mean", "field": "count"}}
    }
  ]
}

看起来像这样：那么水平规则将只是前 3 家商店的平均值。相反，我们需要向 Elasticsearch 请求添加另一个指标聚合，计算全球商店销售帽子的平均值 ()。我们想做这样的事情：

{
  "$schema": "https://vega.github.io/schema/vega-lite/v2.json",
  "title": "Hat Sales",
  "data": {
    "url": {
      "index": "hat-sales",
      "body": {
        "size": 0,
        "query": {"match_all": {}},
        "aggs": {
          "stores": {"terms": {"field": "store.keyword", "size": 3}},
          "global": {
            "filters": {
              "filters": {"all": {"exists": {"field": "store.keyword"}}}
            },
            "aggs": {
              "count": {"value_count": {"field": "store.keyword"}},
              "unique_count": {"cardinality": {"field": "store.keyword"}},
              "global_average": {
                "bucket_script": {
                  "buckets_path": {"total": "count", "unique": "unique_count"},
                  "script": "params.total / params.unique"
                }
              }
            }
          }
        }
      }
    },
    "format": {"property": "aggregations.stores.buckets"}
  },
  "transform": [
    {"calculate": "datum.key", "as": "store"},
    {"calculate": "datum.doc_count", "as": "count"}
  ],
  "layer": [
    {
      "name": "Sales of top 3 stores",
      "mark": "bar",
      "encoding": {
        "x": {"type": "nominal", "field": "store", "sort": "-y"},
        "y": {"type": "quantitative", "field": "count"}
      }
    },
    {
      "name": "Average number of sales over all stores",
      "mark": {"type": "rule", "color": "red"},
      ??????????????????
    }
  ]
}

但是我怎样才能让一层使用来自 "aggregations.stores.buckets" 的数据而另一层使用来自 "aggregations.global.buckets" 的数据来访问 global_average？

Answer 1

我确实使用了它：

{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "description": "A simple bar chart with embedded data.",
  "data": {
    "url": {
      "index": "hat-sales",
      "body": {
        "size": 0,
        "query": {"match_all": {}},
        "aggs": {
          "stores": {"terms": {"field": "store.keyword", "size": 3}},
          "global": {
            "filters": {
              "filters": {"all": {"exists": {"field": "store.keyword"}}}
            },
            "aggs": {
              "count": {"value_count": {"field": "store.keyword"}},
              "unique_count": {"cardinality": {"field": "store.keyword"}},
              "global_average": {
                "bucket_script": {
                  "buckets_path": {"total": "count", "unique": "unique_count"},
                  "script": "params.total / params.unique"
                }
              }
            }
          }
        }
      }
    }
  },
  "transform": [
    {"flatten": ["aggregations.stores.buckets"]},
    {"calculate": "datum['aggregations.stores.buckets'].key", "as": "store"},
    {
      "calculate": "datum['aggregations.stores.buckets'].doc_count",
      "as": "count"
    },
    {
      "calculate": "datum.aggregations.global.buckets.all.global_average.value",
      "as": "global_average"
    }
  ],
  "layer": [
    {
      "name": "Sales of top 3 stores",
      "mark": "bar",
      "encoding": {
        "x": {"type": "nominal", "field": "store", "sort": "-y"},
        "y": {"type": "quantitative", "field": "count"}
      }
    },
    {
      "name": "Global Average",
      "mark": {"type": "rule", "color": "red"},
      "encoding": {"y": {"field": "global_average", "type": "quantitative"}}
    }
  ]
}

它不太理想，因为 flatten 转换使得单个 datum 对象稍微大一些。同样令人困惑的是，一旦你将 aggregations.stores.buckets 展平，它就变成了 datum 字段的字面名称 - "aggregations.stores.buckets"-- 必须通过方括号表示法访问，因为它包含句点.

在 Kibana 的 Vega 中，如何在一个请求中从两个不同的聚合创建层

In Kibana's Vega, how can I create layers from two different aggs in one request

elasticsearch

kibana

vega-lite