Calcite-Druid 适配器中的超唯一聚合

Question

在我的 Druid 数据源中，我在其中一个字段上有一个 hyperUnique 聚合（摄取时间）。

我正在尝试在此汇总字段上执行 COUNT(DISTINCT(<hyperunique_field>)) 的等效操作。

Calcite Druid Adapter 是否支持它？如果是这样，正确的做法是什么？

在胶合板上，我可以做到COUNT_DISTINCT。运行这个 returns 0 计数。

SQL:

select floor("__time" to HOUR) time_bucket,”field_1", count(distinct(“ingestion_time_aggregated_field")) as uniq from “datasource" where "__time" between '2017-01-01 00:00:00' and '2017-01-02 00:00:00' and “field_1" in (‘value_1') and “field_2”='value_2' and “field_3”='value_3' and “field_4”='value_4' group by floor("__time" to HOUR),”field_1" order by floor("__time" to HOUR);

ingestion_time_aggregated_field:

{"name": "ingestion_time_aggregated_field", "type": "hyperUnique","fieldName": “field” }

Answer 1

方解石德鲁伊改编不支持复杂的聚合器。原因是 HLL 是一个近似值而不是精确值，因此它实际上并没有回答唯一计数的查询。

Calcite-Druid 适配器中的超唯一聚合

Hyperunique Aggregations in Calcite-Druid Adapter

druid

apache-calcite