如何在函数调用中替换此相关子查询?

How can I replace this correlated subquery within a function call?

给定下表

metric_id|start_date         |bucket
------------------------------------
a        |2019-12-05 00:00:00|1
a        |2019-12-06 00:00:00|2
b        |2021-10-31 00:00:00|1
b        |2021-11-01 00:00:00|2

point_id|metric_id|timestamp
----------------------------
1       |a        |2019-12-05 00:00:00
2       |a        |2019-12-06 00:00:00
3       |b        |2021-10-31 00:00:00
4       |b        |2021-11-01 00:00:00

和下面的查询

select
       p.metric_id,
       bucket
from points p
left join width_bucket(p.timestamp, (select array(select start_date
                                                  from buckets b
                                                  where b.metric_id = p.metric_id -- correlated sub-query
                                                  ))) as bucket on true

输出

metric_id|bucket
-----------------
a        |1
a        |2
b        |1
b        |2

如何删除相关子查询以提高性能?

目前 ~280,000 个点 * ~650 个桶 = ~180,000,000 个循环 = 非常慢!

基本上我想删除相关的子查询并只对桶中的每个唯一 metric_id 应用一次 width_bucket 函数,以便提高性能并仍然给出正确的函数时间序列数据。

如何在 Postgres 13 中完成此操作?

您可以重写您的查询:

select
    p.metric_id,
    width_bucket(p.timestamp,array_agg(b.start_date)) bucket
from points p
left join buckets b on b.metric_id = p.metric_id
group by p.metric_id, p.timestamp

同时在 buckets.start_date 和点(metric_id、时间戳)上添加索引会有很大帮助。

你可以先用cte聚合bucket

with buckets_arr as (
   select  metric_id, array_agg(start_date order by start_date) arrb
   from buckets
   group by metric_id
)
select
       p.metric_id,
       width_bucket(p.timestamp, ba.arrb) bucket
from points p
join buckets_arr ba on p.metric_id = ba.metric_id