弹性搜索请求
Elasticsearch request
我想用 Elasticsearch-dsl 或 Elasticsearch 做以下请求。
Select all users with the same name but different ages
示例:
索引数据:
{ "name": "name1","age": 20 }
{ "name": "name2","age": 23 }
{ "name": "name3","age": 20 }
{ "name": "name1","age": 22 }
{ "name": "name4","age": 18 }
{ "name": "name2","age": 23 }
{ "name": "name4","age": 18 }
{ "name": "name4","age": 14 }
我想要这样的结果
结果:
{ "name": "name4","age": 18 ,"age": 14 }
{ "name": "name1","age": 22 ,"age": 20 }
不特定于Python,这里需要的是关于年龄的术语聚合,其中名称是特定值:
GET /_search
{
"query" : {
"bool" : {
"should" : { "match" : { "name" : "name1"} }
}
},
"aggs": {
"ages_for_name": {
"terms": { "field": "age" }
}
}
}
和 运行 这用于“name1”和“name4”以获取“ages_for_name”存储桶并仅使用键(存储桶名称)并忽略存储桶值。
还有另一种方法可以解决此问题,方法是聚合姓名,然后仅选择具有不同 min/max 年龄的姓名桶:
POST test/_search
{
"size": 0,
"aggs": {
"names": {
"terms": {
"field": "name.keyword",
"size": 10,
"min_doc_count": 2
},
"aggs": {
"min_age": {
"min": {
"field": "age"
}
},
"max_age": {
"max": {
"field": "age"
}
},
"all_ages": {
"terms": {
"field": "age",
"size": 10
}
},
"diff_ages": {
"bucket_selector": {
"buckets_path": {
"min": "min_age",
"max": "max_age"
},
"script": "params.min != params.max"
}
}
}
}
}
}
响应:您只得到名字 name1
和 name4
因为 name2
有相同的 min/max 年龄。
"buckets" : [
{
"key" : "name4",
"doc_count" : 3,
"max_age" : {
"value" : 18.0
},
"all_ages" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 18,
"doc_count" : 2
},
{
"key" : 14,
"doc_count" : 1
}
]
},
"min_age" : {
"value" : 14.0
}
},
{
"key" : "name1",
"doc_count" : 2,
"max_age" : {
"value" : 22.0
},
"all_ages" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 20,
"doc_count" : 1
},
{
"key" : 22,
"doc_count" : 1
}
]
},
"min_age" : {
"value" : 20.0
}
}
]
您需要申请nested aggregations。由于您来自 python,请遵循 python 脚本:
from elasticsearch import Elasticsearch
# Connect to the elastic cluster
es=Elasticsearch([{'host':'localhost','port':9200}])
your_data = [
{ "name": "name1","age": 20 },
{ "name": "name2","age": 23 },
{ "name": "name3","age": 20 },
{ "name": "name1","age": 22 },
{ "name": "name4","age": 18 },
{ "name": "name2","age": 23 },
{ "name": "name4","age": 18 },
{ "name": "name4","age": 14 }
]
your_index_name = "test_index"
# indexing your exemple data
for doc in your_data:
es.index(index=your_index_name, body=doc)
首先你需要为每个名字创建文档桶,我称之为“buckets_for_name”,然后在 buckets_for_name 中应用年龄的嵌套术语聚合:
# the nested aggregation query
query = {
"aggs": {
"buckets_for_name": {
"terms": { "field": "name.keyword" },
"aggs": {
"age_terms": {
"terms": {
"field": "age"
}
}
}
}
}
}
res = es.search(index=your_index_name, body=query)
# the results are here
res["aggregations"]["buckets_for_name"]["buckets"]
结果并不如你所愿:
[{'key': 'name4',
'doc_count': 3,
'age_terms': {'doc_count_error_upper_bound': 0,
'sum_other_doc_count': 0,
'buckets': [{'key': 18, 'doc_count': 2}, {'key': 14, 'doc_count': 1}]}},
{'key': 'name1',
'doc_count': 2,
'age_terms': {'doc_count_error_upper_bound': 0,
'sum_other_doc_count': 0,
'buckets': [{'key': 20, 'doc_count': 1}, {'key': 22, 'doc_count': 1}]}},
{'key': 'name2',
'doc_count': 2,
'age_terms': {'doc_count_error_upper_bound': 0,
'sum_other_doc_count': 0,
'buckets': [{'key': 23, 'doc_count': 2}]}},
{'key': 'name3',
'doc_count': 1,
'age_terms': {'doc_count_error_upper_bound': 0,
'sum_other_doc_count': 0,
'buckets': [{'key': 20, 'doc_count': 1}]}}]
这么干净。这里有一个建议:
pretty_results = []
for result in res["aggregations"]["buckets_for_name"]["buckets"]:
d = dict()
d["name"] = result["key"]
d["ages"] = []
for age in result["age_terms"]["buckets"]:
d["ages"].append(age["key"])
pretty_results.append(d)
漂亮的输出:
[{'name': 'name4', 'ages': [18, 14]},
{'name': 'name1', 'ages': [20, 22]},
{'name': 'name2', 'ages': [23]},
{'name': 'name3', 'ages': [20]}]
我想用 Elasticsearch-dsl 或 Elasticsearch 做以下请求。
Select all users with the same name but different ages
示例:
索引数据:
{ "name": "name1","age": 20 }
{ "name": "name2","age": 23 }
{ "name": "name3","age": 20 }
{ "name": "name1","age": 22 }
{ "name": "name4","age": 18 }
{ "name": "name2","age": 23 }
{ "name": "name4","age": 18 }
{ "name": "name4","age": 14 }
我想要这样的结果
结果:
{ "name": "name4","age": 18 ,"age": 14 }
{ "name": "name1","age": 22 ,"age": 20 }
不特定于Python,这里需要的是关于年龄的术语聚合,其中名称是特定值:
GET /_search
{
"query" : {
"bool" : {
"should" : { "match" : { "name" : "name1"} }
}
},
"aggs": {
"ages_for_name": {
"terms": { "field": "age" }
}
}
}
和 运行 这用于“name1”和“name4”以获取“ages_for_name”存储桶并仅使用键(存储桶名称)并忽略存储桶值。
还有另一种方法可以解决此问题,方法是聚合姓名,然后仅选择具有不同 min/max 年龄的姓名桶:
POST test/_search
{
"size": 0,
"aggs": {
"names": {
"terms": {
"field": "name.keyword",
"size": 10,
"min_doc_count": 2
},
"aggs": {
"min_age": {
"min": {
"field": "age"
}
},
"max_age": {
"max": {
"field": "age"
}
},
"all_ages": {
"terms": {
"field": "age",
"size": 10
}
},
"diff_ages": {
"bucket_selector": {
"buckets_path": {
"min": "min_age",
"max": "max_age"
},
"script": "params.min != params.max"
}
}
}
}
}
}
响应:您只得到名字 name1
和 name4
因为 name2
有相同的 min/max 年龄。
"buckets" : [
{
"key" : "name4",
"doc_count" : 3,
"max_age" : {
"value" : 18.0
},
"all_ages" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 18,
"doc_count" : 2
},
{
"key" : 14,
"doc_count" : 1
}
]
},
"min_age" : {
"value" : 14.0
}
},
{
"key" : "name1",
"doc_count" : 2,
"max_age" : {
"value" : 22.0
},
"all_ages" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 20,
"doc_count" : 1
},
{
"key" : 22,
"doc_count" : 1
}
]
},
"min_age" : {
"value" : 20.0
}
}
]
您需要申请nested aggregations。由于您来自 python,请遵循 python 脚本:
from elasticsearch import Elasticsearch
# Connect to the elastic cluster
es=Elasticsearch([{'host':'localhost','port':9200}])
your_data = [
{ "name": "name1","age": 20 },
{ "name": "name2","age": 23 },
{ "name": "name3","age": 20 },
{ "name": "name1","age": 22 },
{ "name": "name4","age": 18 },
{ "name": "name2","age": 23 },
{ "name": "name4","age": 18 },
{ "name": "name4","age": 14 }
]
your_index_name = "test_index"
# indexing your exemple data
for doc in your_data:
es.index(index=your_index_name, body=doc)
首先你需要为每个名字创建文档桶,我称之为“buckets_for_name”,然后在 buckets_for_name 中应用年龄的嵌套术语聚合:
# the nested aggregation query
query = {
"aggs": {
"buckets_for_name": {
"terms": { "field": "name.keyword" },
"aggs": {
"age_terms": {
"terms": {
"field": "age"
}
}
}
}
}
}
res = es.search(index=your_index_name, body=query)
# the results are here
res["aggregations"]["buckets_for_name"]["buckets"]
结果并不如你所愿:
[{'key': 'name4',
'doc_count': 3,
'age_terms': {'doc_count_error_upper_bound': 0,
'sum_other_doc_count': 0,
'buckets': [{'key': 18, 'doc_count': 2}, {'key': 14, 'doc_count': 1}]}},
{'key': 'name1',
'doc_count': 2,
'age_terms': {'doc_count_error_upper_bound': 0,
'sum_other_doc_count': 0,
'buckets': [{'key': 20, 'doc_count': 1}, {'key': 22, 'doc_count': 1}]}},
{'key': 'name2',
'doc_count': 2,
'age_terms': {'doc_count_error_upper_bound': 0,
'sum_other_doc_count': 0,
'buckets': [{'key': 23, 'doc_count': 2}]}},
{'key': 'name3',
'doc_count': 1,
'age_terms': {'doc_count_error_upper_bound': 0,
'sum_other_doc_count': 0,
'buckets': [{'key': 20, 'doc_count': 1}]}}]
这么干净。这里有一个建议:
pretty_results = []
for result in res["aggregations"]["buckets_for_name"]["buckets"]:
d = dict()
d["name"] = result["key"]
d["ages"] = []
for age in result["age_terms"]["buckets"]:
d["ages"].append(age["key"])
pretty_results.append(d)
漂亮的输出:
[{'name': 'name4', 'ages': [18, 14]},
{'name': 'name1', 'ages': [20, 22]},
{'name': 'name2', 'ages': [23]},
{'name': 'name3', 'ages': [20]}]