如何在弹性搜索中聚合数组中的匹配字段
How to aggregate over matched fields in an array in elastic search
我的对象带有一个名为 properties 的数组。属性本身就是对象,由字段属性和值(以及其他几个在这里不重要的属性)组成。
我想查找某个属性的所有值。
我目前的方法是对 properties.attribute 使用过滤查询,然后对 properties.value 使用聚合。但这还不够,因为聚合使用了所有定义的属性,而不仅仅是搜索 properties.attribute.
的属性
有没有办法将聚合 'space' 限制为 properties.attribute 匹配的属性?
为了完整起见,这里是找到许多值的 curl 调用,我只对 'farbe'(颜色)感兴趣:
curl -XGET 'http://localhost:9200/pwo/Product/_search?size=0&pretty=true' -d '{
"query": {
"filtered": {
"query": { "match_all" : { } },
"filter": {
"bool": {
"must": { "term": { "properties.attribute": "farbe" } }
}
}
}
},
"aggregations": {
"properties": {
"terms": { "field": "properties.value" }
}
}
}'
nested aggregation and filter aggregation 的组合似乎可以满足您的要求,如果我理解正确的话。
不过,您必须使用 nested type 设置映射。
作为玩具示例,我设置了一个简单的索引,如下所示:
PUT /test_index
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"doc": {
"properties": {
"properties": {
"type": "nested",
"properties": {
"attribute": {
"type": "string"
},
"value": {
"type": "string"
}
}
}
}
}
}
}
(请注意,这有点令人困惑,因为在本例中,"properties" 既是关键字又是 属性 定义。)
现在我可以索引一些文档了:
POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"properties":[{"attribute":"lorem","value":"Donec a diam lectus."},{"attribute":"ipsum","value":"Sed sit amet ipsum mauris."}]}
{"index":{"_id":2}}
{"properties":[{"attribute":"dolor","value":"Donec et mollis dolor."},{"attribute":"sit","value":"Donec sed odio eros."}]}
{"index":{"_id":3}}
{"properties":[{"attribute":"amet","value":"Vivamus fermentum semper porta."}]}
然后我可以在 "properties.value"
上得到一个由 "properties.attribute"
过滤的聚合,如下所示:
POST /test_index/_search?search_type=count
{
"aggs": {
"nested_properties": {
"nested": {
"path": "properties"
},
"aggs": {
"filtered_by_attribute": {
"filter": {
"terms": {
"properties.attribute": [
"lorem",
"amet"
]
}
},
"aggs": {
"value_terms": {
"terms": {
"field": "properties.value"
}
}
}
}
}
}
}
}
在这种情况下 returns:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"nested_properties": {
"doc_count": 5,
"filtered_by_attribute": {
"doc_count": 2,
"value_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "a",
"doc_count": 1
},
{
"key": "diam",
"doc_count": 1
},
{
"key": "donec",
"doc_count": 1
},
{
"key": "fermentum",
"doc_count": 1
},
{
"key": "lectus",
"doc_count": 1
},
{
"key": "porta",
"doc_count": 1
},
{
"key": "semper",
"doc_count": 1
},
{
"key": "vivamus",
"doc_count": 1
}
]
}
}
}
}
}
这是我一起使用的代码:
http://sense.qbox.io/gist/1e0c58aae54090fadfde8856f4f6793b68de0167
我的对象带有一个名为 properties 的数组。属性本身就是对象,由字段属性和值(以及其他几个在这里不重要的属性)组成。
我想查找某个属性的所有值。
我目前的方法是对 properties.attribute 使用过滤查询,然后对 properties.value 使用聚合。但这还不够,因为聚合使用了所有定义的属性,而不仅仅是搜索 properties.attribute.
的属性有没有办法将聚合 'space' 限制为 properties.attribute 匹配的属性?
为了完整起见,这里是找到许多值的 curl 调用,我只对 'farbe'(颜色)感兴趣:
curl -XGET 'http://localhost:9200/pwo/Product/_search?size=0&pretty=true' -d '{
"query": {
"filtered": {
"query": { "match_all" : { } },
"filter": {
"bool": {
"must": { "term": { "properties.attribute": "farbe" } }
}
}
}
},
"aggregations": {
"properties": {
"terms": { "field": "properties.value" }
}
}
}'
nested aggregation and filter aggregation 的组合似乎可以满足您的要求,如果我理解正确的话。
不过,您必须使用 nested type 设置映射。
作为玩具示例,我设置了一个简单的索引,如下所示:
PUT /test_index
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"doc": {
"properties": {
"properties": {
"type": "nested",
"properties": {
"attribute": {
"type": "string"
},
"value": {
"type": "string"
}
}
}
}
}
}
}
(请注意,这有点令人困惑,因为在本例中,"properties" 既是关键字又是 属性 定义。)
现在我可以索引一些文档了:
POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"properties":[{"attribute":"lorem","value":"Donec a diam lectus."},{"attribute":"ipsum","value":"Sed sit amet ipsum mauris."}]}
{"index":{"_id":2}}
{"properties":[{"attribute":"dolor","value":"Donec et mollis dolor."},{"attribute":"sit","value":"Donec sed odio eros."}]}
{"index":{"_id":3}}
{"properties":[{"attribute":"amet","value":"Vivamus fermentum semper porta."}]}
然后我可以在 "properties.value"
上得到一个由 "properties.attribute"
过滤的聚合,如下所示:
POST /test_index/_search?search_type=count
{
"aggs": {
"nested_properties": {
"nested": {
"path": "properties"
},
"aggs": {
"filtered_by_attribute": {
"filter": {
"terms": {
"properties.attribute": [
"lorem",
"amet"
]
}
},
"aggs": {
"value_terms": {
"terms": {
"field": "properties.value"
}
}
}
}
}
}
}
}
在这种情况下 returns:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"nested_properties": {
"doc_count": 5,
"filtered_by_attribute": {
"doc_count": 2,
"value_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "a",
"doc_count": 1
},
{
"key": "diam",
"doc_count": 1
},
{
"key": "donec",
"doc_count": 1
},
{
"key": "fermentum",
"doc_count": 1
},
{
"key": "lectus",
"doc_count": 1
},
{
"key": "porta",
"doc_count": 1
},
{
"key": "semper",
"doc_count": 1
},
{
"key": "vivamus",
"doc_count": 1
}
]
}
}
}
}
}
这是我一起使用的代码:
http://sense.qbox.io/gist/1e0c58aae54090fadfde8856f4f6793b68de0167