范围内数组字段上的 Elasticsearch 聚合
Elasticsearch aggregation on array field with in a range
我将通过过滤范围内的值来对数组字段执行计数聚合。例如,我有以下 3 个文档,我想找出 20210101 到 now() 之间 purchase_date_list 的值计数。预期结果(20210101 - now() 之间的购买计数)将是:
customer_id: 1, 购买次数为: 2
customer_id: 2, 购买次数为: 0
customer_id:3,购买数量为:1
任何人都可以就如何为上述请求编写聚合查询提供一些想法吗?
非常感谢!
{
customer_id: 1,
purchase_date_list: [
20050101,
20210304,
20211121
]
},
{
customer_id: 2,
purchase_date_list: [
20100301
]
},
{
customer_id: 3,
purchase_date_list: [
20210701
]
}
根据我的评论,它使用无痛解决了它。
(因为我仍然不确定如何使用聚合来处理它)
这是帮助我解决这个问题的文档。 [doc]
设置
PUT /so_agg_test/
POST /so_agg_test/_doc
{
"customer_id": 1,
"purchase_date_list": [
20050101,
20210304,
20211121
]
}
POST /so_agg_test/_doc
{
"customer_id": 2,
"purchase_date_list": [
20100301
]
}
POST /so_agg_test/_doc
{
"customer_id": 3,
"purchase_date_list": [
20210701
]
}
GET /so_agg_test/_search
解决方案
此查询将在您 hits
.
中创建一个名为 number_of_sales_interval
的新字段
GET /so_agg_test/_search
{
"query": {
"match_all": {}
},
"script_fields": {
"number_of_sales_interval": {
"script": {
"lang": "painless",
"params": {
"lower_bound": 20210101
},
"source": """
def dates = doc['purchase_date_list'];
def number_of_sales_interval = 0;
for(date in dates){
if(date > params.lower_bound){
number_of_sales_interval += 1;
}
}
return number_of_sales_interval;
"""
}
}
}
}
结果
你应该有类似的东西。
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "so_agg_test",
"_type" : "_doc",
"_id" : "UOVplX0B0iK523s0yaCu",
"_score" : 1.0,
"fields" : {
"number_of_sales_interval" : [
2
]
}
},
{
"_index" : "so_agg_test",
"_type" : "_doc",
"_id" : "UeVplX0B0iK523s01KC-",
"_score" : 1.0,
"fields" : {
"number_of_sales_interval" : [
0
]
}
},
{
"_index" : "so_agg_test",
"_type" : "_doc",
"_id" : "UuVplX0B0iK523s04KAT",
"_score" : 1.0,
"fields" : {
"number_of_sales_interval" : [
1
]
}
}
]
}
}
我将通过过滤范围内的值来对数组字段执行计数聚合。例如,我有以下 3 个文档,我想找出 20210101 到 now() 之间 purchase_date_list 的值计数。预期结果(20210101 - now() 之间的购买计数)将是: customer_id: 1, 购买次数为: 2 customer_id: 2, 购买次数为: 0 customer_id:3,购买数量为:1
任何人都可以就如何为上述请求编写聚合查询提供一些想法吗?
非常感谢!
{
customer_id: 1,
purchase_date_list: [
20050101,
20210304,
20211121
]
},
{
customer_id: 2,
purchase_date_list: [
20100301
]
},
{
customer_id: 3,
purchase_date_list: [
20210701
]
}
根据我的评论,它使用无痛解决了它。 (因为我仍然不确定如何使用聚合来处理它)
这是帮助我解决这个问题的文档。 [doc]
设置
PUT /so_agg_test/
POST /so_agg_test/_doc
{
"customer_id": 1,
"purchase_date_list": [
20050101,
20210304,
20211121
]
}
POST /so_agg_test/_doc
{
"customer_id": 2,
"purchase_date_list": [
20100301
]
}
POST /so_agg_test/_doc
{
"customer_id": 3,
"purchase_date_list": [
20210701
]
}
GET /so_agg_test/_search
解决方案
此查询将在您 hits
.
number_of_sales_interval
的新字段
GET /so_agg_test/_search
{
"query": {
"match_all": {}
},
"script_fields": {
"number_of_sales_interval": {
"script": {
"lang": "painless",
"params": {
"lower_bound": 20210101
},
"source": """
def dates = doc['purchase_date_list'];
def number_of_sales_interval = 0;
for(date in dates){
if(date > params.lower_bound){
number_of_sales_interval += 1;
}
}
return number_of_sales_interval;
"""
}
}
}
}
结果
你应该有类似的东西。
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "so_agg_test",
"_type" : "_doc",
"_id" : "UOVplX0B0iK523s0yaCu",
"_score" : 1.0,
"fields" : {
"number_of_sales_interval" : [
2
]
}
},
{
"_index" : "so_agg_test",
"_type" : "_doc",
"_id" : "UeVplX0B0iK523s01KC-",
"_score" : 1.0,
"fields" : {
"number_of_sales_interval" : [
0
]
}
},
{
"_index" : "so_agg_test",
"_type" : "_doc",
"_id" : "UuVplX0B0iK523s04KAT",
"_score" : 1.0,
"fields" : {
"number_of_sales_interval" : [
1
]
}
}
]
}
}