多个指标 Elasticsearch 的相同聚合
Same aggregation on multiple metrics Elasticsearch
我已经使用 Elasticsearch 设置 snowplow。
当我想获取数据时,我只是进行常规查询并使用聚合按天、国家/地区等获取数据。
所以我想计算这些聚合的点击率,我有两种事件:页面浏览量和点击量。
目前我做了 2 个查询:
页面浏览量:
{
"size": 0,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"event": "page_view"
}
}
],
"must_not": {
"term": {
"br_family": "Robot"
}
}
}
}
}
},
"aggs": {
"dates": {
"date_histogram": {
"field": "collector_tstamp",
"interval": "day"
}
}
}
}
点击次数:
{
"size": 0,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"event": "struct"
}
},
{
"term": {
"se_action": "click"
}
}
],
"must_not": {
"term": {
"br_family": "Robot"
}
}
}
}
}
},
"aggs": {
"dates": {
"date_histogram": {
"field": "collector_tstamp",
"interval": "day"
}
}
}
}
我将响应格式化为更易于使用的格式,然后使用类似这样的格式将它们合并到 PHP 中。
function merge_metrics($pv,$c){
$r = array();
if(count($pv) > 0){
foreach ($pv as $key => $value) {
$r[$value['name']]['page_views'] += $value['count'];
}
}
if(count($c) > 0){
foreach ($c as $key => $value) {
$r[$value['name']]['clicks'] += $value['count'];
}
}
$rf = array();
foreach ($r as $key => $value) {
$tmp_clicks = isset($value['clicks']) ? $value['clicks'] : 0;
$tmp_page_views = isset($value['page_views']) ? isset($value['page_views']) : 0;
$rf[] = array(
'name' => $key,
'page_views' => $tmp_page_views,
'clicks' => $tmp_clicks,
'ctr' => ctr($tmp_clicks,$tmp_page_views)
);
}
return $rf;
}
$pv 和 $c 都是包含查询 Elasticsearch 的结果的数组,为了便于使用,我做了一些格式化。
我的问题是:
是否可以获取多个指标(在我的例子中是页面浏览量和点击量,这些是特定的过滤器)并对两者执行相同的聚合?然后返回类似的聚合:
{
"data": [
{
"day": "2015-10-13",
"page_views": 61,
"clicks": 0,
},
{
"day": "2015-10-14",
"page_views": 135,
"clicks": 1,
},
{
"day": "2015-10-15",
"page_views": 39,
"clicks": 0,
}
]
}
但我不必手动合并它们?
是的,如果您将聚合合并到一个查询中,那绝对是可能的。例如,我假设您有一个这样的页面浏览查询:
{
"query": {...}
"aggregations": {
"by_day": {
"date_histogram": {
"field": "day",
"interval": "day"
},
"aggs": {
"page_views_per_day": {
"sum": {
"field": "page_views"
}
}
}
}
}
}
还有另一个这样的点击查询:
{
"query": {...}
"aggregations": {
"by_day": {
"date_histogram": {
"field": "day",
"interval": "day"
},
"aggs": {
"clicks_per_day": {
"sum": {
"field": "clicks"
}
}
}
}
}
}
如果您在 query
中有相同的约束,您绝对可以在 date_histogram
级别将它们合并在一起,如下所示:
{
"query": {...}
"aggregations": {
"by_day": {
"date_histogram": {
"field": "day",
"interval": "day"
},
"aggs": {
"page_views_per_day": {
"sum": {
"field": "page_views"
}
},
"clicks_per_day": {
"sum": {
"field": "clicks"
}
}
}
}
}
}
更新
由于您的查询对于每个聚合都不同,因此我们需要稍微不同地进行处理,即使用额外的 filters
聚合,如下所示:
{
"size": 0,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"terms": {
"event": [
"page_view",
"struct"
]
}
}
],
"should": {
"term": {
"se_action": "click"
}
},
"must_not": {
"term": {
"br_family": "Robot"
}
}
}
}
}
},
"aggs": {
"dates": {
"date_histogram": {
"field": "collector_tstamp",
"interval": "day"
},
"aggs": {
"my_filters": {
"filters": {
"filters": {
"page_views_filter": {
"bool": {
"must": [
{
"term": {
"event": "page_view"
}
}
],
"must_not": {
"term": {
"br_family": "Robot"
}
}
}
},
"clicks_filter": {
"bool": {
"must": [
{
"term": {
"event": "struct"
}
},
{
"term": {
"se_action": "click"
}
}
],
"must_not": {
"term": {
"br_family": "Robot"
}
}
}
}
}
}
}
}
}
}
}
现在对于每个每日存储桶,您最终会得到两个子存储桶,一个用于页面浏览量计数,另一个用于点击计数。
我已经使用 Elasticsearch 设置 snowplow。
当我想获取数据时,我只是进行常规查询并使用聚合按天、国家/地区等获取数据。
所以我想计算这些聚合的点击率,我有两种事件:页面浏览量和点击量。
目前我做了 2 个查询:
页面浏览量:
{
"size": 0,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"event": "page_view"
}
}
],
"must_not": {
"term": {
"br_family": "Robot"
}
}
}
}
}
},
"aggs": {
"dates": {
"date_histogram": {
"field": "collector_tstamp",
"interval": "day"
}
}
}
}
点击次数:
{
"size": 0,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"event": "struct"
}
},
{
"term": {
"se_action": "click"
}
}
],
"must_not": {
"term": {
"br_family": "Robot"
}
}
}
}
}
},
"aggs": {
"dates": {
"date_histogram": {
"field": "collector_tstamp",
"interval": "day"
}
}
}
}
我将响应格式化为更易于使用的格式,然后使用类似这样的格式将它们合并到 PHP 中。
function merge_metrics($pv,$c){
$r = array();
if(count($pv) > 0){
foreach ($pv as $key => $value) {
$r[$value['name']]['page_views'] += $value['count'];
}
}
if(count($c) > 0){
foreach ($c as $key => $value) {
$r[$value['name']]['clicks'] += $value['count'];
}
}
$rf = array();
foreach ($r as $key => $value) {
$tmp_clicks = isset($value['clicks']) ? $value['clicks'] : 0;
$tmp_page_views = isset($value['page_views']) ? isset($value['page_views']) : 0;
$rf[] = array(
'name' => $key,
'page_views' => $tmp_page_views,
'clicks' => $tmp_clicks,
'ctr' => ctr($tmp_clicks,$tmp_page_views)
);
}
return $rf;
}
$pv 和 $c 都是包含查询 Elasticsearch 的结果的数组,为了便于使用,我做了一些格式化。
我的问题是:
是否可以获取多个指标(在我的例子中是页面浏览量和点击量,这些是特定的过滤器)并对两者执行相同的聚合?然后返回类似的聚合:
{
"data": [
{
"day": "2015-10-13",
"page_views": 61,
"clicks": 0,
},
{
"day": "2015-10-14",
"page_views": 135,
"clicks": 1,
},
{
"day": "2015-10-15",
"page_views": 39,
"clicks": 0,
}
]
}
但我不必手动合并它们?
是的,如果您将聚合合并到一个查询中,那绝对是可能的。例如,我假设您有一个这样的页面浏览查询:
{
"query": {...}
"aggregations": {
"by_day": {
"date_histogram": {
"field": "day",
"interval": "day"
},
"aggs": {
"page_views_per_day": {
"sum": {
"field": "page_views"
}
}
}
}
}
}
还有另一个这样的点击查询:
{
"query": {...}
"aggregations": {
"by_day": {
"date_histogram": {
"field": "day",
"interval": "day"
},
"aggs": {
"clicks_per_day": {
"sum": {
"field": "clicks"
}
}
}
}
}
}
如果您在 query
中有相同的约束,您绝对可以在 date_histogram
级别将它们合并在一起,如下所示:
{
"query": {...}
"aggregations": {
"by_day": {
"date_histogram": {
"field": "day",
"interval": "day"
},
"aggs": {
"page_views_per_day": {
"sum": {
"field": "page_views"
}
},
"clicks_per_day": {
"sum": {
"field": "clicks"
}
}
}
}
}
}
更新
由于您的查询对于每个聚合都不同,因此我们需要稍微不同地进行处理,即使用额外的 filters
聚合,如下所示:
{
"size": 0,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"terms": {
"event": [
"page_view",
"struct"
]
}
}
],
"should": {
"term": {
"se_action": "click"
}
},
"must_not": {
"term": {
"br_family": "Robot"
}
}
}
}
}
},
"aggs": {
"dates": {
"date_histogram": {
"field": "collector_tstamp",
"interval": "day"
},
"aggs": {
"my_filters": {
"filters": {
"filters": {
"page_views_filter": {
"bool": {
"must": [
{
"term": {
"event": "page_view"
}
}
],
"must_not": {
"term": {
"br_family": "Robot"
}
}
}
},
"clicks_filter": {
"bool": {
"must": [
{
"term": {
"event": "struct"
}
},
{
"term": {
"se_action": "click"
}
}
],
"must_not": {
"term": {
"br_family": "Robot"
}
}
}
}
}
}
}
}
}
}
}
现在对于每个每日存储桶,您最终会得到两个子存储桶,一个用于页面浏览量计数,另一个用于点击计数。