查询优化——如何在这个查询中实现?
Query optimization- How to achieve that in this query?
如何优化此查询?我创建了索引、分区、增加了工作内存,但执行时间仍然是 35 秒。我怎样才能将它最小化到 10-15 秒?
更新:
- 删除了每个时间戳从 utc 到本地时间的转换,即 time_stamp AT TIME ZONE 'utc' AT TIME ZONE,这将性能提高了大约 5 秒。当前执行时间:36.5 秒。
explain analyse select
DATE_TRUNC('day', time_stamp) as "time_stamp",
COUNT(DISTINCT id) AS alarm_count,
COUNT(DISTINCT patient_id) AS patient_count
FROM
alarm_management.alarm
WHERE
tenant_name = 'abc'
and
unit = ANY('{a,b,c,d,e,f,g,h,i,j,k}'::text[])
AND
time_stamp BETWEEN '2021-09-15 02:25:00' AND '2021-12-14 04:36:45'
AND
severity_label = ANY('{a,b,c,d}'::text[])
AND derived_label IS NOT NULL
GROUP by 1
解释(分析、详细、缓冲)输出-
GroupAggregate (cost=3064683.77..3215681.44 rows=308821 width=24) (actual time=24242.730..35145.380 rows=91 loops=1)
Group Key: (date_trunc('day'::text, alarm_hospitalc_burn_2021_9.time_stamp))
-> Sort (cost=3064683.77..3101468.12 rows=14713740 width=40) (actual time=24167.513..25036.293 rows=16369464 loops=1)
Sort Key: (date_trunc('day'::text, alarm_hospitalc_burn_2021_9.time_stamp))
Sort Method: quicksort Memory: 1672081kB
-> Append (cost=0.00..1312964.42 rows=14713740 width=40) (actual time=0.308..20958.290 rows=16369464 loops=1)
-> Seq Scan on alarm_hospitalc_burn_2021_9 (cost=0.00..7175.10 rows=69691 width=40) (actual time=0.307..127.521 rows=94286 loops=1)
Filter: ((derived_label IS NOT NULL) AND (time_stamp >= '2021-09-15 02:25:00'::timestamp without time zone) AND (time_stamp <= '2021-12-14 04:36:45'::timestamp without time zone) AND (tenant_name = 'HospitalC'::text) AND (severity_label = ANY ('{"Short Yellow",Cyan,Red,Yellow}'::text[])) AND (unit = ANY ('{Burn,Delivery,EDI,EDT,EDW,ICU1,ICU2,ICU3P,ICU4P,PP,Tele}'::text[])))
函数可以写成SQL,可能会稍微快一点:
CREATE OR REPLACE FUNCTION dbo.get_time_group ( _date_type TEXT )
RETURNS TEXT
LANGUAGE sql -- SQL is good enough
IMMUTABLE -- better for performance, next call is faster because of caching
AS
$$
SELECT CASE
WHEN 'hour' THEN 'hour'
ELSE 'day'
END;
$$;
但最重要的是查询计划。
如何优化此查询?我创建了索引、分区、增加了工作内存,但执行时间仍然是 35 秒。我怎样才能将它最小化到 10-15 秒? 更新:
- 删除了每个时间戳从 utc 到本地时间的转换,即 time_stamp AT TIME ZONE 'utc' AT TIME ZONE,这将性能提高了大约 5 秒。当前执行时间:36.5 秒。
explain analyse select
DATE_TRUNC('day', time_stamp) as "time_stamp",
COUNT(DISTINCT id) AS alarm_count,
COUNT(DISTINCT patient_id) AS patient_count
FROM
alarm_management.alarm
WHERE
tenant_name = 'abc'
and
unit = ANY('{a,b,c,d,e,f,g,h,i,j,k}'::text[])
AND
time_stamp BETWEEN '2021-09-15 02:25:00' AND '2021-12-14 04:36:45'
AND
severity_label = ANY('{a,b,c,d}'::text[])
AND derived_label IS NOT NULL
GROUP by 1
解释(分析、详细、缓冲)输出-
GroupAggregate (cost=3064683.77..3215681.44 rows=308821 width=24) (actual time=24242.730..35145.380 rows=91 loops=1)
Group Key: (date_trunc('day'::text, alarm_hospitalc_burn_2021_9.time_stamp))
-> Sort (cost=3064683.77..3101468.12 rows=14713740 width=40) (actual time=24167.513..25036.293 rows=16369464 loops=1)
Sort Key: (date_trunc('day'::text, alarm_hospitalc_burn_2021_9.time_stamp))
Sort Method: quicksort Memory: 1672081kB
-> Append (cost=0.00..1312964.42 rows=14713740 width=40) (actual time=0.308..20958.290 rows=16369464 loops=1)
-> Seq Scan on alarm_hospitalc_burn_2021_9 (cost=0.00..7175.10 rows=69691 width=40) (actual time=0.307..127.521 rows=94286 loops=1)
Filter: ((derived_label IS NOT NULL) AND (time_stamp >= '2021-09-15 02:25:00'::timestamp without time zone) AND (time_stamp <= '2021-12-14 04:36:45'::timestamp without time zone) AND (tenant_name = 'HospitalC'::text) AND (severity_label = ANY ('{"Short Yellow",Cyan,Red,Yellow}'::text[])) AND (unit = ANY ('{Burn,Delivery,EDI,EDT,EDW,ICU1,ICU2,ICU3P,ICU4P,PP,Tele}'::text[])))
函数可以写成SQL,可能会稍微快一点:
CREATE OR REPLACE FUNCTION dbo.get_time_group ( _date_type TEXT )
RETURNS TEXT
LANGUAGE sql -- SQL is good enough
IMMUTABLE -- better for performance, next call is faster because of caching
AS
$$
SELECT CASE
WHEN 'hour' THEN 'hour'
ELSE 'day'
END;
$$;
但最重要的是查询计划。