如何在函数调用中替换此相关子查询?
How can I replace this correlated subquery within a function call?
给定下表
桶
metric_id|start_date |bucket
------------------------------------
a |2019-12-05 00:00:00|1
a |2019-12-06 00:00:00|2
b |2021-10-31 00:00:00|1
b |2021-11-01 00:00:00|2
分
point_id|metric_id|timestamp
----------------------------
1 |a |2019-12-05 00:00:00
2 |a |2019-12-06 00:00:00
3 |b |2021-10-31 00:00:00
4 |b |2021-11-01 00:00:00
和下面的查询
select
p.metric_id,
bucket
from points p
left join width_bucket(p.timestamp, (select array(select start_date
from buckets b
where b.metric_id = p.metric_id -- correlated sub-query
))) as bucket on true
输出
metric_id|bucket
-----------------
a |1
a |2
b |1
b |2
如何删除相关子查询以提高性能?
目前 ~280,000 个点 * ~650 个桶 = ~180,000,000 个循环 = 非常慢!
基本上我想删除相关的子查询并只对桶中的每个唯一 metric_id 应用一次 width_bucket 函数,以便提高性能并仍然给出正确的函数时间序列数据。
如何在 Postgres 13 中完成此操作?
您可以重写您的查询:
select
p.metric_id,
width_bucket(p.timestamp,array_agg(b.start_date)) bucket
from points p
left join buckets b on b.metric_id = p.metric_id
group by p.metric_id, p.timestamp
同时在 buckets.start_date 和点(metric_id、时间戳)上添加索引会有很大帮助。
你可以先用cte聚合bucket
with buckets_arr as (
select metric_id, array_agg(start_date order by start_date) arrb
from buckets
group by metric_id
)
select
p.metric_id,
width_bucket(p.timestamp, ba.arrb) bucket
from points p
join buckets_arr ba on p.metric_id = ba.metric_id
给定下表
桶
metric_id|start_date |bucket
------------------------------------
a |2019-12-05 00:00:00|1
a |2019-12-06 00:00:00|2
b |2021-10-31 00:00:00|1
b |2021-11-01 00:00:00|2
分
point_id|metric_id|timestamp
----------------------------
1 |a |2019-12-05 00:00:00
2 |a |2019-12-06 00:00:00
3 |b |2021-10-31 00:00:00
4 |b |2021-11-01 00:00:00
和下面的查询
select
p.metric_id,
bucket
from points p
left join width_bucket(p.timestamp, (select array(select start_date
from buckets b
where b.metric_id = p.metric_id -- correlated sub-query
))) as bucket on true
输出
metric_id|bucket
-----------------
a |1
a |2
b |1
b |2
如何删除相关子查询以提高性能?
目前 ~280,000 个点 * ~650 个桶 = ~180,000,000 个循环 = 非常慢!
基本上我想删除相关的子查询并只对桶中的每个唯一 metric_id 应用一次 width_bucket 函数,以便提高性能并仍然给出正确的函数时间序列数据。
如何在 Postgres 13 中完成此操作?
您可以重写您的查询:
select
p.metric_id,
width_bucket(p.timestamp,array_agg(b.start_date)) bucket
from points p
left join buckets b on b.metric_id = p.metric_id
group by p.metric_id, p.timestamp
同时在 buckets.start_date 和点(metric_id、时间戳)上添加索引会有很大帮助。
你可以先用cte聚合bucket
with buckets_arr as (
select metric_id, array_agg(start_date order by start_date) arrb
from buckets
group by metric_id
)
select
p.metric_id,
width_bucket(p.timestamp, ba.arrb) bucket
from points p
join buckets_arr ba on p.metric_id = ba.metric_id