Elasticsearch:面桶中空字段的聚合

Elasticsearch: Aggregation of null fields in a facet bucket

我正在尝试在当前版本的 Amazon Elasticsearch Service(版本 7.10)中实施具有日期范围聚合的分面。我希望文章文档分组的关键是publishedAt,什么是日期。我想要一个桶,其中 publishedAt 是过去的,这意味着它是 published,一个是未来的桶,这意味着 scheduled 和一个用于所有没有 publishedAt,也就是 draftspublishedscheduled 正在正常工作。对于 drafts,我无法输入过滤器或日期范围,因为它们为空。所以我想利用"Missing Values" feature。这应该将带有 publishedAt = null 的文档视为在 missing 字段中给出日期。不幸的是,它对结果没有影响。即使我更改 missing 的日期以使其与 publishedscheduled.

匹配

我的要求:

获取https://es.amazonaws.com/articles/_search

{
    "size": 10,
    "aggs": {
        "facet_bucket_all": {
            "aggs": {
                "channel": {
                    "terms": {
                        "field": "channel.keyword",
                        "size": 5
                    }
                },
                "brand": {
                    "terms": {
                        "field": "brand.keyword",
                        "size": 5
                    }
                },
                "articleStatus": {
                    "date_range": {
                        "field": "publishedAt",
                        "format": "dd-MM-yyyy",
                        "missing": "01-07-1886",
                        "ranges": [
                            { "key": "published", "from": "now-99y/M", "to": "now/M" },
                            { "key": "scheduled", "from": "now+1s/M", "to": "now+99y/M" },
                            { "key": "drafts", "from": "01-01-1886", "to": "31-12-1886" }
                        ]
                    }
                }
            },
            "filter": {
                "bool": {
                    "must": []
                }
            }
        },
        "facet_bucket_publishedAt": {
            "aggs": {},
            "filter": {
                "bool": {
                    "must": []
                }
            }
        },
        "facet_bucket_author": {
            "aggs": {
                "author": {
                    "terms": {
                        "field": "author",
                        "size": 10
                    }
                }
            },
            "filter": {
                "bool": {
                    "must": []
                }
            }
        }
    },
    "query": {
        "bool": {
            "filter": [
                {
                    "range": {
                        "publishedAt": {
                            "lte": "2021-08-09T09:52:19.975Z"
                        }
                    }
                }
            ]
        }
    },
    "from": 0,
    "sort": [
        {
            "_score": "desc"
        }
    ]
}

在结果中,drafts 为空:

"articleStatus": {
    "buckets": [
        {
            "key": "published",
            "from": -1.496448E12,
            "from_as_string": "01-08-1922",
            "to": 1.627776E12,
            "to_as_string": "01-08-2021",
            "doc_count": 47920
        },
        {
            "key": "scheduled",
            "from": 1.627776E12,
            "from_as_string": "01-08-2021",
            "to": 4.7519136E12,
            "to_as_string": "01-08-2120",
            "doc_count": 3
        },
        {
            "key": "drafts",
            "from": 1.67252256E13,
            "from_as_string": "01-01-1886",
            "to": 1.67566752E13,
            "to_as_string": "31-12-1886",
            "doc_count": 0
        }
    ]
}

SearchKit 将此部分添加到查询中:

"query": {
    "bool": {
        "filter": [
            {
                "range": {
                    "publishedAt": {
                        "lte": "2021-08-09T09:52:19.975Z"
                    }
                }
            }
        ]
    }
}

必须删除它,因为它会在缺少的过滤器发挥作用之前过滤掉空值。

现在我得到了正确的结果:

"articleStatus": {
    "buckets": [
        {
            "key": "drafts",
            "from": -2.650752E12,
            "from_as_string": "01-01-1886",
            "to": -2.6193024E12,
            "to_as_string": "31-12-1886",
            "doc_count": 7
        },
        {
            "key": "published",
            "from": -1.496448E12,
            "from_as_string": "01-08-1922",
            "to": 1.627776E12,
            "to_as_string": "01-08-2021",
            "doc_count": 47920
        },
        {
            "key": "scheduled",
            "from": 1.627776E12,
            "from_as_string": "01-08-2021",
            "to": 4.7519136E12,
            "to_as_string": "01-08-2120",
            "doc_count": 3
        }
    ]
}