如何在 Postgres 中搜索分区 table?
How to search in partitoned table in Postgres?
CREATE TABLE IF NOT EXISTS tasks
(
id bigint not null,
created_date timestamp not null,
status_code integer,
target_identity varchar(255),
updated_date timestamp,
UNIQUE (created_date, target_identity)
) PARTITION BY RANGE (created_date);
CREATE TABLE IF NOT EXISTS tasks2020_04 PARTITION OF tasks FOR VALUES FROM ('2020-04-01') TO ('2020-05-01');
CREATE INDEX IF NOT EXISTS idx_task_created_date ON tasks2020_04 (created_date);
CREATE TABLE IF NOT EXISTS tasks2020_05 PARTITION OF tasks FOR VALUES FROM ('2020-05-01') TO ('2020-06-01');
CREATE INDEX IF NOT EXISTS idx_task_created_date ON tasks2020_05 (created_date);
CREATE TABLE IF NOT EXISTS tasks2020_06 PARTITION OF tasks FOR VALUES FROM ('2020-06-01') TO ('2020-07-01');
CREATE INDEX IF NOT EXISTS idx_task_created_date ON tasks2020_06 (created_date);
CREATE TABLE IF NOT EXISTS tasks2020_07 PARTITION OF tasks FOR VALUES FROM ('2020-07-01') TO ('2020-08-01');
CREATE INDEX IF NOT EXISTS idx_task_created_date ON tasks2020_07 (created_date);
分区已创建,没有errors.It的好。
但问题是,当我执行这个查询时
SET enable_partition_pruning = on;
EXPLAIN ANALYZE select * from tasks p
where p.created_date >= DATE '2020-04-01' AND p.target_identity in ('identity')
它搜索每个分区。
解释分析:
"QUERY PLAN"
"Append (cost=0.14..206.16 rows=24 width=560) (actual time=0.072..0.072 rows=0 loops=1)"
" -> Index Scan using tasks2020_04_created_date_target_identity_key on tasks2020_04 p (cost=0.14..8.58 rows=1 width=560) (actual time=0.009..0.009 rows=0 loops=1)"
" Index Cond: ((created_date >= '2020-04-01'::date) AND ((target_identity)::text = 'identity'::text))"
" -> Index Scan using tasks2020_05_created_date_target_identity_key on tasks2020_05 p_1 (cost=0.14..8.58 rows=1 width=560) (actual time=0.002..0.003 rows=0 loops=1)"
" Index Cond: ((created_date >= '2020-04-01'::date) AND ((target_identity)::text = 'identity'::text))"
" -> Index Scan using tasks2020_06_created_date_target_identity_key on tasks2020_06 p_2 (cost=0.14..8.58 rows=1 width=560) (actual time=0.002..0.002 rows=0 loops=1)"
" Index Cond: ((created_date >= '2020-04-01'::date) AND ((target_identity)::text = 'identity'::text))"
" -> Index Scan using tasks2020_07_created_date_target_identity_key on tasks2020_07 p_3 (cost=0.14..8.58 rows=1 width=560) (actual time=0.002..0.002 rows=0 loops=1)"
为什么?也许缺少一些索引?
在这个网站上,https://postgrespro.ru/docs/postgresql/12/ddl-partitioning#DDL-PARTITION-PRUNING,我做的完全一样
这是来自网站:
SET enable_partition_pruning = on;
EXPLAIN SELECT count(*) FROM measurement WHERE log_date >= DATE '2008-01-01';
QUERY PLAN
-----------------------------------------------------------------------------------
Aggregate (cost=37.75..37.76 rows=1 width=8)
-> Seq Scan on measurement_y2008m01 (cost=0.00..33.12 rows=617 width=0)
Filter: (log_date >= '2008-01-01'::date)
版本:x86_64-pc-linux-gnu 上的 PostgreSQL 12.0 (Debian 12.0-2.pgdg100+1),由 gcc (Debian 8.3.0-6) 8.3.0 编译,64 位
上面提到的例子是不一样的,因为 WHERE 子句只允许扫描一个分区:最后一个。这里恰恰相反:分区扫描是预期的,因为所有扫描的分区都匹配子句p.created_date >= DATE '2020-04-01
。为避免这种情况,您需要提供一个限制分区列表的 WHERE 子句,例如:
EXPLAIN ANALYZE select * from tasks p
where
p.created_date >= DATE '2020-04-01'
AND p.created_date <= DATE '2020-04-30'
AND p.target_identity in ('identity');
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_task_created_date on tasks2020_04 p (cost=0.14..8.17 rows=1 width=544) (actual time=0.004..0.004 rows=0 loops=1)
Index Cond: ((created_date >= '2020-04-01'::date) AND (created_date <= '2020-04-30'::date))
Filter: ((target_identity)::text = 'identity'::text)
Planning Time: 0.182 ms
Execution Time: 0.018 ms
(5 rows)
或者(此查询更匹配给定的示例:仅扫描最后一个现有分区):
EXPLAIN ANALYZE select * from tasks p
where
p.created_date >= DATE '2020-07-01'
AND p.target_identity in ('identity');
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using tasks2020_07_created_date_target_identity_key on tasks2020_07 p (cost=0.14..8.62 rows=1 width=544) (actual time=0.004..0.005 rows=0 loops=1)
Index Cond: ((created_date >= '2020-07-01'::date) AND ((target_identity)::text = 'identity'::text))
Planning Time: 0.074 ms
Execution Time: 0.018 ms
(4 rows)
CREATE TABLE IF NOT EXISTS tasks
(
id bigint not null,
created_date timestamp not null,
status_code integer,
target_identity varchar(255),
updated_date timestamp,
UNIQUE (created_date, target_identity)
) PARTITION BY RANGE (created_date);
CREATE TABLE IF NOT EXISTS tasks2020_04 PARTITION OF tasks FOR VALUES FROM ('2020-04-01') TO ('2020-05-01');
CREATE INDEX IF NOT EXISTS idx_task_created_date ON tasks2020_04 (created_date);
CREATE TABLE IF NOT EXISTS tasks2020_05 PARTITION OF tasks FOR VALUES FROM ('2020-05-01') TO ('2020-06-01');
CREATE INDEX IF NOT EXISTS idx_task_created_date ON tasks2020_05 (created_date);
CREATE TABLE IF NOT EXISTS tasks2020_06 PARTITION OF tasks FOR VALUES FROM ('2020-06-01') TO ('2020-07-01');
CREATE INDEX IF NOT EXISTS idx_task_created_date ON tasks2020_06 (created_date);
CREATE TABLE IF NOT EXISTS tasks2020_07 PARTITION OF tasks FOR VALUES FROM ('2020-07-01') TO ('2020-08-01');
CREATE INDEX IF NOT EXISTS idx_task_created_date ON tasks2020_07 (created_date);
分区已创建,没有errors.It的好。
但问题是,当我执行这个查询时
SET enable_partition_pruning = on;
EXPLAIN ANALYZE select * from tasks p
where p.created_date >= DATE '2020-04-01' AND p.target_identity in ('identity')
它搜索每个分区。 解释分析:
"QUERY PLAN"
"Append (cost=0.14..206.16 rows=24 width=560) (actual time=0.072..0.072 rows=0 loops=1)"
" -> Index Scan using tasks2020_04_created_date_target_identity_key on tasks2020_04 p (cost=0.14..8.58 rows=1 width=560) (actual time=0.009..0.009 rows=0 loops=1)"
" Index Cond: ((created_date >= '2020-04-01'::date) AND ((target_identity)::text = 'identity'::text))"
" -> Index Scan using tasks2020_05_created_date_target_identity_key on tasks2020_05 p_1 (cost=0.14..8.58 rows=1 width=560) (actual time=0.002..0.003 rows=0 loops=1)"
" Index Cond: ((created_date >= '2020-04-01'::date) AND ((target_identity)::text = 'identity'::text))"
" -> Index Scan using tasks2020_06_created_date_target_identity_key on tasks2020_06 p_2 (cost=0.14..8.58 rows=1 width=560) (actual time=0.002..0.002 rows=0 loops=1)"
" Index Cond: ((created_date >= '2020-04-01'::date) AND ((target_identity)::text = 'identity'::text))"
" -> Index Scan using tasks2020_07_created_date_target_identity_key on tasks2020_07 p_3 (cost=0.14..8.58 rows=1 width=560) (actual time=0.002..0.002 rows=0 loops=1)"
为什么?也许缺少一些索引?
在这个网站上,https://postgrespro.ru/docs/postgresql/12/ddl-partitioning#DDL-PARTITION-PRUNING,我做的完全一样
这是来自网站: SET enable_partition_pruning = on;
EXPLAIN SELECT count(*) FROM measurement WHERE log_date >= DATE '2008-01-01';
QUERY PLAN
-----------------------------------------------------------------------------------
Aggregate (cost=37.75..37.76 rows=1 width=8)
-> Seq Scan on measurement_y2008m01 (cost=0.00..33.12 rows=617 width=0)
Filter: (log_date >= '2008-01-01'::date)
版本:x86_64-pc-linux-gnu 上的 PostgreSQL 12.0 (Debian 12.0-2.pgdg100+1),由 gcc (Debian 8.3.0-6) 8.3.0 编译,64 位
上面提到的例子是不一样的,因为 WHERE 子句只允许扫描一个分区:最后一个。这里恰恰相反:分区扫描是预期的,因为所有扫描的分区都匹配子句p.created_date >= DATE '2020-04-01
。为避免这种情况,您需要提供一个限制分区列表的 WHERE 子句,例如:
EXPLAIN ANALYZE select * from tasks p
where
p.created_date >= DATE '2020-04-01'
AND p.created_date <= DATE '2020-04-30'
AND p.target_identity in ('identity');
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_task_created_date on tasks2020_04 p (cost=0.14..8.17 rows=1 width=544) (actual time=0.004..0.004 rows=0 loops=1)
Index Cond: ((created_date >= '2020-04-01'::date) AND (created_date <= '2020-04-30'::date))
Filter: ((target_identity)::text = 'identity'::text)
Planning Time: 0.182 ms
Execution Time: 0.018 ms
(5 rows)
或者(此查询更匹配给定的示例:仅扫描最后一个现有分区):
EXPLAIN ANALYZE select * from tasks p
where
p.created_date >= DATE '2020-07-01'
AND p.target_identity in ('identity');
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using tasks2020_07_created_date_target_identity_key on tasks2020_07 p (cost=0.14..8.62 rows=1 width=544) (actual time=0.004..0.005 rows=0 loops=1)
Index Cond: ((created_date >= '2020-07-01'::date) AND ((target_identity)::text = 'identity'::text))
Planning Time: 0.074 ms
Execution Time: 0.018 ms
(4 rows)