ElasticSearch:聚合过滤
ElasticSearch: aggregation filtering
为简单起见,假设我在弹性中有 3 行的索引:
{"id": 1, "tags": ["t1", "t2", "t3"]},
{"id": 2, "tags": ["t1", "t4", "t5"]}
我需要通过一些标签进行聚合,而不返回匹配文档中其他标签的结果:
{
"aggs": {
"tags": {
"terms": {"field": "tags"}
}
},
"query": {
"bool": {
"filter": [
{
"terms": {"tags": ["t1", "t2"]}
}
]
}
}
}
# RESULT
{
"aggregations": {
"tags": {
"buckets": [
{"doc_count": 2, "key": "t1"},
{"doc_count": 1, "key": "t2"},
{"doc_count": 1, "key": "t3"}, # should be removed by filter
{"doc_count": 1, "key": "t4"}, # should be removed by filter
{"doc_count": 1, "key": "t5"}, # should be removed by filter
],
}
},
"hits": {
"hits": [],
"max_score": 0.0,
"total": 2
},
}
如何(也许)后过滤这个结果?
因为在索引中有 3 行的情况下,这只有 3 个额外项目(t3、t4、t5)。但在实际情况下,我的索引中有超过 20 万行,这太可怕了!我需要按 50 个标签聚合,但我得到的结果超过 1K 个标签。
假设你的 Elasticsearch 版本支持它,我应该对术语聚合使用 "include" 属性。您的查询应如上:
POST /test/_search
{
"aggs": {
"tags": {
"terms": {"field": "tags", "include": ["t1", "t2"]}
}
},
"query": {
"bool": {
"filter": [
{
"terms": {"tags": ["t1", "t2"]}
}
]
}
}
}
```
为简单起见,假设我在弹性中有 3 行的索引:
{"id": 1, "tags": ["t1", "t2", "t3"]},
{"id": 2, "tags": ["t1", "t4", "t5"]}
我需要通过一些标签进行聚合,而不返回匹配文档中其他标签的结果:
{
"aggs": {
"tags": {
"terms": {"field": "tags"}
}
},
"query": {
"bool": {
"filter": [
{
"terms": {"tags": ["t1", "t2"]}
}
]
}
}
}
# RESULT
{
"aggregations": {
"tags": {
"buckets": [
{"doc_count": 2, "key": "t1"},
{"doc_count": 1, "key": "t2"},
{"doc_count": 1, "key": "t3"}, # should be removed by filter
{"doc_count": 1, "key": "t4"}, # should be removed by filter
{"doc_count": 1, "key": "t5"}, # should be removed by filter
],
}
},
"hits": {
"hits": [],
"max_score": 0.0,
"total": 2
},
}
如何(也许)后过滤这个结果?
因为在索引中有 3 行的情况下,这只有 3 个额外项目(t3、t4、t5)。但在实际情况下,我的索引中有超过 20 万行,这太可怕了!我需要按 50 个标签聚合,但我得到的结果超过 1K 个标签。
假设你的 Elasticsearch 版本支持它,我应该对术语聚合使用 "include" 属性。您的查询应如上:
POST /test/_search
{
"aggs": {
"tags": {
"terms": {"field": "tags", "include": ["t1", "t2"]}
}
},
"query": {
"bool": {
"filter": [
{
"terms": {"tags": ["t1", "t2"]}
}
]
}
}
}
```