分区 table 中的慢查询
Slow Query in partitioned table
我们有 postgresql12 并且有一个很大的 table,大小为 108gb,包括索引。由于查询变慢,我们尝试对 table 进行分区。但这并没有帮助。
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
SELECT
val_a AS x,
val_b AS y,
SUM(value) AS value
FROM
test_table_tp
WHERE
client = '767jjDHIPLkshj'
AND identity_1 = '12edfdijijasd'
AND identity_2 = '98jjaskhuUUHss'
AND identity_3 = 1
AND date_col BETWEEN '2021-04-01'::date AND ('2021-07-01'::date + Interval '1 day')::date
GROUP BY
val_a,
val_b;
执行大约需要 3 分钟。
-> HashAggregate (cost=260438.41..260718.30 rows=27989 width=16) (actual time=214427.239..214427.614 rows=1298 loops=1)
Output: test_table_tp.val_a, test_table_tp.val_b, sum(test_table_tp.value)
Group Key: test_table_tp.val_a, test_table_tp.val_b
Buffers: shared hit=216750 read=331936
I/O Timings: read=211977.843
-> Append (cost=0.56..258339.24 rows=279890 width=12) (actual time=3.057..213976.334 rows=722233 loops=1)
Buffers: shared hit=216750 read=331936
I/O Timings: read=211977.843
-> Index Scan using idx_202104_108521 on public.test_table_tp_p2021_04_108521 test_table_tp (cost=0.56..71154.06 rows=83535 width=12) (actual time=3.056..69033.315 rows=216908 loops=1)
Output: test_table_tp.val_a, test_table_tp.val_b, test_table_tp.value
Index Cond: (((test_table_tp.client)::text = '767jjDHIPLkshj'::text) AND ((test_table_tp.identity_1)::text = '12edfdijijasd'::text) AND ((test_table_tp.identity_2)::text = '98jjaskhuUUHss'::text) AND (test_table_tp.identity_3 = 1) AND (test_table_tp.date_col >= '2021-04-01'::date) AND (test_table_tp.date_col <= '2021-07-02'::date))
Buffers: shared hit=68990 read=96437
I/O Timings: read=68466.167
-> Index Scan using idx_202105_108553 on public.test_table_tp_p2021_05_108553 test_table_tp_1 (cost=0.56..51361.84 rows=55441 width=12) (actual time=8.641..55685.999 rows=160618 loops=1)
Output: test_table_tp_1.val_a, test_table_tp_1.val_b, test_table_tp_1.value
Index Cond: (((test_table_tp_1.client)::text = '767jjDHIPLkshj'::text) AND ((test_table_tp_1.identity_1)::text = '12edfdijijasd'::text) AND ((test_table_tp_1.identity_2)::text = '98jjaskhuUUHss'::text) AND (test_table_tp_1.identity_3 = 1) AND (test_table_tp_1.date_col >= '2021-04-01'::date) AND (test_table_tp_1.date_col <= '2021-07-02'::date))
Buffers: shared hit=48314 read=69911
I/O Timings: read=55277.406
-> Index Scan using idx_202106_108585 on public.test_table_tp_p2021_06_108585 test_table_tp_2 (cost=0.56..63581.82 rows=66779 width=12) (actual time=2.870..48249.339 rows=188842 loops=1)
Output: test_table_tp_2.val_a, test_table_tp_2.val_b, test_table_tp_2.value
Index Cond: (((test_table_tp_2.client)::text = '767jjDHIPLkshj'::text) AND ((test_table_tp_2.identity_1)::text = '12edfdijijasd'::text) AND ((test_table_tp_2.identity_2)::text = '98jjaskhuUUHss'::text) AND (test_table_tp_2.identity_3 = 1) AND (test_table_tp_2.date_col >= '2021-04-01'::date) AND (test_table_tp_2.date_col <= '2021-07-02'::date))
Buffers: shared hit=54983 read=90249
I/O Timings: read=47732.316
-> Index Scan using idx_202107_108617 on public.test_table_tp_p2021_07_108617 test_table_tp_3 (cost=0.56..70842.08 rows=74135 width=12) (actual time=2.849..40902.561 rows=155865 loops=1)
Output: test_table_tp_3.val_a, test_table_tp_3.val_b, test_table_tp_3.value
Index Cond: (((test_table_tp_3.client)::text = '767jjDHIPLkshj'::text) AND ((test_table_tp_3.identity_1)::text = '12edfdijijasd'::text) AND ((test_table_tp_3.identity_2)::text = '98jjaskhuUUHss'::text) AND (test_table_tp_3.identity_3 = 1) AND (test_table_tp_3.date_col >= '2021-04-01'::date) AND (test_table_tp_3.date_col <= '2021-07-02'::date))
Buffers: shared hit=44463 read=75339
I/O Timings: read=40501.954
Planning Time: 18.081 ms
Execution Time: 214427.963 ms
Planning Time: 0.083 ms
Execution Time: 214461.427 ms
(41 rows)
Time: 214462.391 ms (03:34.462)
这已经是一个汇总 table,其中包含每月汇总数据。进一步聚合也是不可能的,因为可以应用许多过滤器。标识列在 table.
中具有唯一键
可以做些什么来进一步优化这个吗?任何帮助是极大的赞赏。提前谢谢你。
这个查询会随着分区变得变慢,虽然不多。
除了获得更快的磁盘或更多 RAM 之外,加快此查询的唯一方法是
包括val_a
、val_b
和value
到索引中,VACUUM
和table,这样你就可以快速得到一个仅索引扫描
CREATE INDEX ON test_table_tp (
identity_1,
identity_2,
identity_3,
date_col,
val_a,
val_b,
value
);
CLUSTER
table 使用索引,使行存储在更少的块中
我们有 postgresql12 并且有一个很大的 table,大小为 108gb,包括索引。由于查询变慢,我们尝试对 table 进行分区。但这并没有帮助。
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
SELECT
val_a AS x,
val_b AS y,
SUM(value) AS value
FROM
test_table_tp
WHERE
client = '767jjDHIPLkshj'
AND identity_1 = '12edfdijijasd'
AND identity_2 = '98jjaskhuUUHss'
AND identity_3 = 1
AND date_col BETWEEN '2021-04-01'::date AND ('2021-07-01'::date + Interval '1 day')::date
GROUP BY
val_a,
val_b;
执行大约需要 3 分钟。
-> HashAggregate (cost=260438.41..260718.30 rows=27989 width=16) (actual time=214427.239..214427.614 rows=1298 loops=1)
Output: test_table_tp.val_a, test_table_tp.val_b, sum(test_table_tp.value)
Group Key: test_table_tp.val_a, test_table_tp.val_b
Buffers: shared hit=216750 read=331936
I/O Timings: read=211977.843
-> Append (cost=0.56..258339.24 rows=279890 width=12) (actual time=3.057..213976.334 rows=722233 loops=1)
Buffers: shared hit=216750 read=331936
I/O Timings: read=211977.843
-> Index Scan using idx_202104_108521 on public.test_table_tp_p2021_04_108521 test_table_tp (cost=0.56..71154.06 rows=83535 width=12) (actual time=3.056..69033.315 rows=216908 loops=1)
Output: test_table_tp.val_a, test_table_tp.val_b, test_table_tp.value
Index Cond: (((test_table_tp.client)::text = '767jjDHIPLkshj'::text) AND ((test_table_tp.identity_1)::text = '12edfdijijasd'::text) AND ((test_table_tp.identity_2)::text = '98jjaskhuUUHss'::text) AND (test_table_tp.identity_3 = 1) AND (test_table_tp.date_col >= '2021-04-01'::date) AND (test_table_tp.date_col <= '2021-07-02'::date))
Buffers: shared hit=68990 read=96437
I/O Timings: read=68466.167
-> Index Scan using idx_202105_108553 on public.test_table_tp_p2021_05_108553 test_table_tp_1 (cost=0.56..51361.84 rows=55441 width=12) (actual time=8.641..55685.999 rows=160618 loops=1)
Output: test_table_tp_1.val_a, test_table_tp_1.val_b, test_table_tp_1.value
Index Cond: (((test_table_tp_1.client)::text = '767jjDHIPLkshj'::text) AND ((test_table_tp_1.identity_1)::text = '12edfdijijasd'::text) AND ((test_table_tp_1.identity_2)::text = '98jjaskhuUUHss'::text) AND (test_table_tp_1.identity_3 = 1) AND (test_table_tp_1.date_col >= '2021-04-01'::date) AND (test_table_tp_1.date_col <= '2021-07-02'::date))
Buffers: shared hit=48314 read=69911
I/O Timings: read=55277.406
-> Index Scan using idx_202106_108585 on public.test_table_tp_p2021_06_108585 test_table_tp_2 (cost=0.56..63581.82 rows=66779 width=12) (actual time=2.870..48249.339 rows=188842 loops=1)
Output: test_table_tp_2.val_a, test_table_tp_2.val_b, test_table_tp_2.value
Index Cond: (((test_table_tp_2.client)::text = '767jjDHIPLkshj'::text) AND ((test_table_tp_2.identity_1)::text = '12edfdijijasd'::text) AND ((test_table_tp_2.identity_2)::text = '98jjaskhuUUHss'::text) AND (test_table_tp_2.identity_3 = 1) AND (test_table_tp_2.date_col >= '2021-04-01'::date) AND (test_table_tp_2.date_col <= '2021-07-02'::date))
Buffers: shared hit=54983 read=90249
I/O Timings: read=47732.316
-> Index Scan using idx_202107_108617 on public.test_table_tp_p2021_07_108617 test_table_tp_3 (cost=0.56..70842.08 rows=74135 width=12) (actual time=2.849..40902.561 rows=155865 loops=1)
Output: test_table_tp_3.val_a, test_table_tp_3.val_b, test_table_tp_3.value
Index Cond: (((test_table_tp_3.client)::text = '767jjDHIPLkshj'::text) AND ((test_table_tp_3.identity_1)::text = '12edfdijijasd'::text) AND ((test_table_tp_3.identity_2)::text = '98jjaskhuUUHss'::text) AND (test_table_tp_3.identity_3 = 1) AND (test_table_tp_3.date_col >= '2021-04-01'::date) AND (test_table_tp_3.date_col <= '2021-07-02'::date))
Buffers: shared hit=44463 read=75339
I/O Timings: read=40501.954
Planning Time: 18.081 ms
Execution Time: 214427.963 ms
Planning Time: 0.083 ms
Execution Time: 214461.427 ms
(41 rows)
Time: 214462.391 ms (03:34.462)
这已经是一个汇总 table,其中包含每月汇总数据。进一步聚合也是不可能的,因为可以应用许多过滤器。标识列在 table.
中具有唯一键可以做些什么来进一步优化这个吗?任何帮助是极大的赞赏。提前谢谢你。
这个查询会随着分区变得变慢,虽然不多。
除了获得更快的磁盘或更多 RAM 之外,加快此查询的唯一方法是
包括
val_a
、val_b
和value
到索引中,VACUUM
和table,这样你就可以快速得到一个仅索引扫描CREATE INDEX ON test_table_tp ( identity_1, identity_2, identity_3, date_col, val_a, val_b, value );
CLUSTER
table 使用索引,使行存储在更少的块中