Elasticsearch 指标聚合:数组中元素的数量
Elasticsearch metric aggregation: number of elements in array
我想做一个比较有参与度的query/aggregation。我看不出如何,因为我刚刚开始使用 ES。我的文件看起来像这样:
{
"keyword": "some keyword",
"items": [
{
"name":"my first item",
"item_property_1":"A",
( other properties here )
},
{
"name":"my second item",
"item_property_1":"B",
( other properties here )
},
{
"name":"my third item",
"item_property_1":"A",
( other properties here )
}
]
( other properties... )
},
{
"keyword": "different keyword",
"items": [
{
"name":"cool item",
"item_property_1":"A",
( other properties here )
},
{
"name":"awesome item",
"item_property_1":"C",
( other properties here )
},
]
( other properties... )
},
( other documents... )
现在,我想做的是,对于每个关键字,计算 property_1 可以具有的几个可能值中有多少个项目。也就是说,我想要一个具有以下响应的桶聚合:
{
"keyword": "some keyword",
"item_property_1_aggretation": [
{
"key":"A",
"count": 2,
},
{
"key":"B",
"count": 1,
}
]
},
{
"keyword": "different keyword",
"item_property_1_aggretation": [
{
"key":"A",
"count": 1,
},
{
"key":"C",
"count": 1,
}
]
},
( other keywords... )
如果需要映射,您能具体说明一下吗?我没有任何非默认映射,我只是把所有东西都放在那里。
编辑:
通过在此处发布上一个示例的批量 PUT 来省去您的麻烦
PUT /test/test/_bulk
{ "index": {}}
{ "keyword": "some keyword", "items": [ { "name":"my first item", "item_property_1":"A" }, { "name":"my second item", "item_property_1":"B" }, { "name":"my third item", "item_property_1":"A" } ]}
{ "index": {}}
{ "keyword": "different keyword", "items": [ { "name":"cool item", "item_property_1":"A" }, { "name":"awesome item", "item_property_1":"C" } ]}
编辑2:
我刚试过这个:
POST /test/test/_search
{
"size":2,
"aggregations": {
"property_1_count": {
"terms":{
"field":"item_property_1"
}
}
}
}
得到这个:
"aggregations": {
"property_1_count": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "a",
"doc_count": 2
},
{
"key": "b",
"doc_count": 1
},
{
"key": "c",
"doc_count": 1
}
]
}
}
关闭但没有雪茄。您可以看到发生了什么,它在每个 item_property_1
上进行分桶,而不管它属于哪个 keyword
。我确定解决方案涉及正确添加一些映射,但我无法确定。建议?
编辑3:
基于此:
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-nested-type.html
我想尝试将 nested
类型添加到 属性 items
。为此,我尝试了:
PUT /test/_mapping/test
{
"test":{
"properties": {
"items": {
"type": "nested",
"properties": {
"item_property_1":{"type":"string"}
}
}
}
}
}
然而,这returns一个错误:
{
"error": "MergeMappingException[Merge failed with failures {[object mapping [items] can't be changed from non-nested to nested]}]",
"status": 400
}
这可能与 url 上的警告有关:"changing an object type to nested type requires reindexing."
那么,我该怎么做呢?
不错的尝试,您快成功了!这是我想出的。根据您的映射建议,我使用的映射如下:
curl -XPUT localhost:9200/test/_mapping/test -d '{
"test": {
"properties": {
"keyword": {
"type": "string",
"index": "not_analyzed"
},
"items": {
"type": "nested",
"properties": {
"name": {
"type": "string"
},
"item_property_1": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}'
注意:您需要擦除数据并重新编制索引,因为您无法将字段类型从非 nested
更改为 nested
。
然后我使用您共享的批量查询创建了一些数据:
curl -XPOST localhost:9200/test/test/_bulk -d '
{ "index": {}}
{ "keyword": "some keyword", "items": [ { "name":"my first item", "item_property_1":"A" }, { "name":"my second item", "item_property_1":"B" }, { "name":"my third item", "item_property_1":"A" } ]}
{ "index": {}}
{ "keyword": "different keyword", "items": [ { "name":"cool item", "item_property_1":"A" }, { "name":"awesome item", "item_property_1":"C" } ]}
'
最后,这里是聚合查询,您可以使用它来获得您期望的结果。我们首先通过 keyword
对 items
使用 terms
aggregation and then for each keyword, we bucket by the nested item_property_1
field. Since items
is now a nested
type, the key is to use a nested
aggregation,然后对 item_property_1
字段使用 terms
子聚合。
{
"size": 0,
"aggregations": {
"by_keyword": {
"terms": {
"field": "keyword"
},
"aggs": {
"prop_1_count": {
"nested": {
"path": "items"
},
"aggs": {
"prop_1": {
"terms": {
"field": "items.item_property_1"
}
}
}
}
}
}
}
}
运行 对您的数据集的查询将产生以下结果:
{
...
"aggregations" : {
"by_keyword" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ {
"key" : "different keyword", <---- keyword 1
"doc_count" : 1,
"prop_1_count" : {
"doc_count" : 2,
"prop_1" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ { <---- buckets for item_property_1
"key" : "A",
"doc_count" : 1
}, {
"key" : "C",
"doc_count" : 1
} ]
}
}
}, {
"key" : "some keyword", <---- keyword 2
"doc_count" : 1,
"prop_1_count" : {
"doc_count" : 3,
"prop_1" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ { <---- buckets for item_property_1
"key" : "A",
"doc_count" : 2
}, {
"key" : "B",
"doc_count" : 1
} ]
}
}
} ]
}
}
}
我想做一个比较有参与度的query/aggregation。我看不出如何,因为我刚刚开始使用 ES。我的文件看起来像这样:
{
"keyword": "some keyword",
"items": [
{
"name":"my first item",
"item_property_1":"A",
( other properties here )
},
{
"name":"my second item",
"item_property_1":"B",
( other properties here )
},
{
"name":"my third item",
"item_property_1":"A",
( other properties here )
}
]
( other properties... )
},
{
"keyword": "different keyword",
"items": [
{
"name":"cool item",
"item_property_1":"A",
( other properties here )
},
{
"name":"awesome item",
"item_property_1":"C",
( other properties here )
},
]
( other properties... )
},
( other documents... )
现在,我想做的是,对于每个关键字,计算 property_1 可以具有的几个可能值中有多少个项目。也就是说,我想要一个具有以下响应的桶聚合:
{
"keyword": "some keyword",
"item_property_1_aggretation": [
{
"key":"A",
"count": 2,
},
{
"key":"B",
"count": 1,
}
]
},
{
"keyword": "different keyword",
"item_property_1_aggretation": [
{
"key":"A",
"count": 1,
},
{
"key":"C",
"count": 1,
}
]
},
( other keywords... )
如果需要映射,您能具体说明一下吗?我没有任何非默认映射,我只是把所有东西都放在那里。
编辑: 通过在此处发布上一个示例的批量 PUT 来省去您的麻烦
PUT /test/test/_bulk
{ "index": {}}
{ "keyword": "some keyword", "items": [ { "name":"my first item", "item_property_1":"A" }, { "name":"my second item", "item_property_1":"B" }, { "name":"my third item", "item_property_1":"A" } ]}
{ "index": {}}
{ "keyword": "different keyword", "items": [ { "name":"cool item", "item_property_1":"A" }, { "name":"awesome item", "item_property_1":"C" } ]}
编辑2:
我刚试过这个:
POST /test/test/_search
{
"size":2,
"aggregations": {
"property_1_count": {
"terms":{
"field":"item_property_1"
}
}
}
}
得到这个:
"aggregations": {
"property_1_count": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "a",
"doc_count": 2
},
{
"key": "b",
"doc_count": 1
},
{
"key": "c",
"doc_count": 1
}
]
}
}
关闭但没有雪茄。您可以看到发生了什么,它在每个 item_property_1
上进行分桶,而不管它属于哪个 keyword
。我确定解决方案涉及正确添加一些映射,但我无法确定。建议?
编辑3:
基于此:
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-nested-type.html
我想尝试将 nested
类型添加到 属性 items
。为此,我尝试了:
PUT /test/_mapping/test
{
"test":{
"properties": {
"items": {
"type": "nested",
"properties": {
"item_property_1":{"type":"string"}
}
}
}
}
}
然而,这returns一个错误:
{
"error": "MergeMappingException[Merge failed with failures {[object mapping [items] can't be changed from non-nested to nested]}]",
"status": 400
}
这可能与 url 上的警告有关:"changing an object type to nested type requires reindexing."
那么,我该怎么做呢?
不错的尝试,您快成功了!这是我想出的。根据您的映射建议,我使用的映射如下:
curl -XPUT localhost:9200/test/_mapping/test -d '{
"test": {
"properties": {
"keyword": {
"type": "string",
"index": "not_analyzed"
},
"items": {
"type": "nested",
"properties": {
"name": {
"type": "string"
},
"item_property_1": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}'
注意:您需要擦除数据并重新编制索引,因为您无法将字段类型从非 nested
更改为 nested
。
然后我使用您共享的批量查询创建了一些数据:
curl -XPOST localhost:9200/test/test/_bulk -d '
{ "index": {}}
{ "keyword": "some keyword", "items": [ { "name":"my first item", "item_property_1":"A" }, { "name":"my second item", "item_property_1":"B" }, { "name":"my third item", "item_property_1":"A" } ]}
{ "index": {}}
{ "keyword": "different keyword", "items": [ { "name":"cool item", "item_property_1":"A" }, { "name":"awesome item", "item_property_1":"C" } ]}
'
最后,这里是聚合查询,您可以使用它来获得您期望的结果。我们首先通过 keyword
对 items
使用 terms
aggregation and then for each keyword, we bucket by the nested item_property_1
field. Since items
is now a nested
type, the key is to use a nested
aggregation,然后对 item_property_1
字段使用 terms
子聚合。
{
"size": 0,
"aggregations": {
"by_keyword": {
"terms": {
"field": "keyword"
},
"aggs": {
"prop_1_count": {
"nested": {
"path": "items"
},
"aggs": {
"prop_1": {
"terms": {
"field": "items.item_property_1"
}
}
}
}
}
}
}
}
运行 对您的数据集的查询将产生以下结果:
{
...
"aggregations" : {
"by_keyword" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ {
"key" : "different keyword", <---- keyword 1
"doc_count" : 1,
"prop_1_count" : {
"doc_count" : 2,
"prop_1" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ { <---- buckets for item_property_1
"key" : "A",
"doc_count" : 1
}, {
"key" : "C",
"doc_count" : 1
} ]
}
}
}, {
"key" : "some keyword", <---- keyword 2
"doc_count" : 1,
"prop_1_count" : {
"doc_count" : 3,
"prop_1" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ { <---- buckets for item_property_1
"key" : "A",
"doc_count" : 2
}, {
"key" : "B",
"doc_count" : 1
} ]
}
}
} ]
}
}
}