搜索具有最高字段的文档
Search documents with highest fields
我正在尝试获取所有具有最高字段值的文档(+ 条件词过滤器)
给定员工映射
Name Department Salary
----------------------------
Tomcat Dev 100
Bobcat QA 90
Beast QA 100
Tom Dev 100
Bob Dev 90
在SQL中看起来像
select * from Employees where Salary = select max(salary) from Employees
预期输出
Name Department Salary
----------------------------
Tomcat Dev 100
Beast QA 100
Tom Dev 100
和
select * from Employees where Salary = (select max(salary) from Employees where Department ='Dev' )
预期输出
Name Department Salary
----------------------------
Tomcat Dev 100
Tom Dev 100
Elasticsearch 可以吗?
以下应该有所帮助:
查看您的数据,请注意我已经得出以下映射:
映射:
PUT my-salary-index
{
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"department":{
"type": "keyword"
},
"salary":{
"type": "float"
}
}
}
}
示例文档:
POST my-salary-index/_doc/1
{
"name": "Tomcat",
"department": "Dev",
"salary": 100
}
POST my-salary-index/_doc/2
{
"name": "Bobcast",
"department": "QA",
"salary": 90
}
POST my-salary-index/_doc/3
{
"name": "Beast",
"department": "QA",
"salary": 100
}
POST my-salary-index/_doc/4
{
"name": "Tom",
"department": "Dev",
"salary": 100
}
POST my-salary-index/_doc/5
{
"name": "Bob",
"department": "Dev",
"salary": 90
}
解决方案:
场景 1:Return所有拥有最高薪水的员工
POST my-salary-index/_search
{
"size": 0,
"aggs": {
"my_employees_salary":{
"terms": {
"field": "salary",
"size": 1, <--- Note this
"order": {
"_key": "desc"
}
},
"aggs": {
"my_employees": {
"top_hits": { <--- Note this. Top hits aggregation
"size": 10
}
}
}
}
}
}
请注意,我使用了链接到它的 Terms Aggregation with Top Hits 聚合。我建议通过链接来理解这两种聚合。
所以基本上您只需要检索 Terms Aggregation 中的第一个元素,这就是我提到 size: 1
的原因。另请注意 order
,以防万一您需要检索最低值。
场景 1 响应:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"my_employees" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 2,
"buckets" : [
{
"key" : 100.0,
"doc_count" : 3,
"employees" : {
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my-salary-index",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"name" : "Tomcat",
"department" : "Dev",
"salary" : 100
}
},
{
"_index" : "my-salary-index",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"name" : "Beast",
"department" : "QA",
"salary" : 100
}
},
{
"_index" : "my-salary-index",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"name" : "Tom",
"department" : "Dev",
"salary" : 100
}
}
]
}
}
}
]
}
}
}
场景 2:Return 特定部门所有最高薪水的员工
POST my-salary-index/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"term": {
"department": "Dev"
}
}
]
}
},
"aggs": {
"my_employees_salary":{
"terms": {
"field": "salary",
"size": 1,
"order": {
"_key": "desc"
}
},
"aggs": {
"my_employees": {
"top_hits": {
"size": 10
}
}
}
}
}
}
为此,有很多方法可以做到这一点,但基本思路是先过滤文档,然后再在其上应用聚合。这样效率会更高。
请注意,我刚刚向场景 1 的解决方案中提到的聚合查询添加了一个 bool 条件。
场景 2 响应
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"my_employees_salary" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 1,
"buckets" : [
{
"key" : 100.0,
"doc_count" : 2,
"my_employees" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.53899646,
"hits" : [
{
"_index" : "my-salary-index",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.53899646,
"_source" : {
"name" : "Tomcat",
"department" : "Dev",
"salary" : 100
}
},
{
"_index" : "my-salary-index",
"_type" : "_doc",
"_id" : "4",
"_score" : 0.53899646,
"_source" : {
"name" : "Tom",
"department" : "Dev",
"salary" : 100
}
}
]
}
}
}
]
}
}
}
如果你有完整的 xpack 或者 x-pack 的许可版本,你也可以考虑使用 SQL Access。
希望对您有所帮助。
我正在尝试获取所有具有最高字段值的文档(+ 条件词过滤器)
给定员工映射
Name Department Salary
----------------------------
Tomcat Dev 100
Bobcat QA 90
Beast QA 100
Tom Dev 100
Bob Dev 90
在SQL中看起来像
select * from Employees where Salary = select max(salary) from Employees
预期输出
Name Department Salary
----------------------------
Tomcat Dev 100
Beast QA 100
Tom Dev 100
和
select * from Employees where Salary = (select max(salary) from Employees where Department ='Dev' )
预期输出
Name Department Salary
----------------------------
Tomcat Dev 100
Tom Dev 100
Elasticsearch 可以吗?
以下应该有所帮助:
查看您的数据,请注意我已经得出以下映射:
映射:
PUT my-salary-index
{
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"department":{
"type": "keyword"
},
"salary":{
"type": "float"
}
}
}
}
示例文档:
POST my-salary-index/_doc/1
{
"name": "Tomcat",
"department": "Dev",
"salary": 100
}
POST my-salary-index/_doc/2
{
"name": "Bobcast",
"department": "QA",
"salary": 90
}
POST my-salary-index/_doc/3
{
"name": "Beast",
"department": "QA",
"salary": 100
}
POST my-salary-index/_doc/4
{
"name": "Tom",
"department": "Dev",
"salary": 100
}
POST my-salary-index/_doc/5
{
"name": "Bob",
"department": "Dev",
"salary": 90
}
解决方案:
场景 1:Return所有拥有最高薪水的员工
POST my-salary-index/_search
{
"size": 0,
"aggs": {
"my_employees_salary":{
"terms": {
"field": "salary",
"size": 1, <--- Note this
"order": {
"_key": "desc"
}
},
"aggs": {
"my_employees": {
"top_hits": { <--- Note this. Top hits aggregation
"size": 10
}
}
}
}
}
}
请注意,我使用了链接到它的 Terms Aggregation with Top Hits 聚合。我建议通过链接来理解这两种聚合。
所以基本上您只需要检索 Terms Aggregation 中的第一个元素,这就是我提到 size: 1
的原因。另请注意 order
,以防万一您需要检索最低值。
场景 1 响应:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"my_employees" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 2,
"buckets" : [
{
"key" : 100.0,
"doc_count" : 3,
"employees" : {
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my-salary-index",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"name" : "Tomcat",
"department" : "Dev",
"salary" : 100
}
},
{
"_index" : "my-salary-index",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"name" : "Beast",
"department" : "QA",
"salary" : 100
}
},
{
"_index" : "my-salary-index",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"name" : "Tom",
"department" : "Dev",
"salary" : 100
}
}
]
}
}
}
]
}
}
}
场景 2:Return 特定部门所有最高薪水的员工
POST my-salary-index/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"term": {
"department": "Dev"
}
}
]
}
},
"aggs": {
"my_employees_salary":{
"terms": {
"field": "salary",
"size": 1,
"order": {
"_key": "desc"
}
},
"aggs": {
"my_employees": {
"top_hits": {
"size": 10
}
}
}
}
}
}
为此,有很多方法可以做到这一点,但基本思路是先过滤文档,然后再在其上应用聚合。这样效率会更高。 请注意,我刚刚向场景 1 的解决方案中提到的聚合查询添加了一个 bool 条件。
场景 2 响应
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"my_employees_salary" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 1,
"buckets" : [
{
"key" : 100.0,
"doc_count" : 2,
"my_employees" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.53899646,
"hits" : [
{
"_index" : "my-salary-index",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.53899646,
"_source" : {
"name" : "Tomcat",
"department" : "Dev",
"salary" : 100
}
},
{
"_index" : "my-salary-index",
"_type" : "_doc",
"_id" : "4",
"_score" : 0.53899646,
"_source" : {
"name" : "Tom",
"department" : "Dev",
"salary" : 100
}
}
]
}
}
}
]
}
}
}
如果你有完整的 xpack 或者 x-pack 的许可版本,你也可以考虑使用 SQL Access。
希望对您有所帮助。