FULL JOIN with =any 不使用索引
FULL JOIN with =any doesn't use indexes
使用 Postgres 9.3.5,我似乎无法通过 =any
where-clause 获得完整的外部连接来使用相关索引。
一个最小的例子:
create table t1(i int primary key, j int);
create table t2(i int primary key, j int);
insert into t1 (select x,x from generate_series(1,1000000) x);
insert into t2 (select x,x from generate_series(1,1000000) x);
vacuum analyze;
explain analyze
select *
from t1 full join t2 using(i)
where i =any (array[1,2]);
(在我的真实查询中,数组是参数,长度可变)
我得到以下序列扫描查询计划:
Hash Full Join (cost=26925.00..66350.00 rows=10000 width=16) (actual time=178.308..1251.221 rows=2 loops=1)
Hash Cond: (t1.i = t2.i)
Filter: (COALESCE(t1.i, t2.i) = ANY ('{1,2}'::integer[]))
Rows Removed by Filter: 999998
-> Seq Scan on t1 (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.011..59.463 rows=1000000 loops=1)
-> Hash (cost=14425.00..14425.00 rows=1000000 width=8) (actual time=178.212..178.212 rows=1000000 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 39063kB
-> Seq Scan on t2 (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.012..57.751 rows=1000000 loops=1)
Total runtime: 1255.734 ms
我尝试过的不成功的事情:
- 使用
i in (1,2)
或i=1 or i=2
代替=any
set enable_seqscan to f
用左连接和反连接模拟完整连接:
explain analyze
select * from
(select i, t1.j, t2.j from t1 left join t2 using(i)
union all
select i, null, j from t2
where not exists (select 1 from t1 where t1.i = t2.i)) sub
where i =any (array[1,2]);
Append (cost=0.85..51.61 rows=3 width=12) (actual time=0.007..0.018 rows=2 loops=1)
-> Nested Loop Left Join (cost=0.85..29.79 rows=2 width=12) (actual time=0.007..0.010 rows=2 loops=1)
-> Index Scan using t1_pkey on t1 (cost=0.42..12.88 rows=2 width=8) (actual time=0.003..0.005 rows=2 loops=1)
Index Cond: (i = ANY ('{1,2}'::integer[]))
-> Index Scan using t2_pkey on t2 (cost=0.42..8.44 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=2)
Index Cond: (t1.i = i)
-> Nested Loop Anti Join (cost=0.85..21.79 rows=1 width=8) (actual time=0.008..0.008 rows=0 loops=1)
-> Index Scan using t2_pkey on t2 t2_1 (cost=0.42..12.88 rows=2 width=8) (actual time=0.001..0.002 rows=2 loops=1)
Index Cond: (i = ANY ('{1,2}'::integer[]))
-> Index Only Scan using t1_pkey on t1 t1_1 (cost=0.42..4.44 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=2)
Index Cond: (i = t2_1.i)
Heap Fetches: 0
Total runtime: 0.065 ms
不过,这种方法会使我的实际查询复杂化并增加重复。有没有更好的方法让 Postgres 使用索引?
将谓词下推到子查询中就可以了:
EXPLAIN ANALYZE
SELECT *
FROM (SELECT * FROM t1 WHERE i = ANY ('{1,2}')) t1
FULL JOIN (SELECT * FROM t2 WHERE i = ANY ('{1,2}')) t2 USING (i);
QUERY PLAN
Merge Full Join (cost=0.58..25.26 rows=2 width=16) (actual time=0.084..0.100 rows=2 loops=1)
Merge Cond: (t1.i = t2.i)
-> Index Scan using t1_pkey on t1 (cost=0.29..12.62 rows=2 width=8) (actual time=0.044..0.048 rows=2 loops=1)
Index Cond: (i = ANY ('{1,2}'::integer[]))
-> Index Scan using t2_pkey on t2 (cost=0.29..12.62 rows=2 width=8) (actual time=0.028..0.033 rows=2 loops=1)
Index Cond: (i = ANY ('{1,2}'::integer[]))
Total runtime: 0.256 ms
SQL Fiddle(10 万行)。
显然,查询规划器不够聪明,无法断定在 完全连接后列上的谓词可以使用基础表上的索引。这可以改进。
现在无法测试 pg 9.4。也许已经改进了。
顺便说一句,大多数客户端无法使用相同名称处理结果中的多个列(即使 Postgres 可以这样做)。 j
的两个实例将是一个问题,您必须至少使用一个列别名,迫使您明确列出列。
使用 Postgres 9.3.5,我似乎无法通过 =any
where-clause 获得完整的外部连接来使用相关索引。
一个最小的例子:
create table t1(i int primary key, j int);
create table t2(i int primary key, j int);
insert into t1 (select x,x from generate_series(1,1000000) x);
insert into t2 (select x,x from generate_series(1,1000000) x);
vacuum analyze;
explain analyze
select *
from t1 full join t2 using(i)
where i =any (array[1,2]);
(在我的真实查询中,数组是参数,长度可变)
我得到以下序列扫描查询计划:
Hash Full Join (cost=26925.00..66350.00 rows=10000 width=16) (actual time=178.308..1251.221 rows=2 loops=1)
Hash Cond: (t1.i = t2.i)
Filter: (COALESCE(t1.i, t2.i) = ANY ('{1,2}'::integer[]))
Rows Removed by Filter: 999998
-> Seq Scan on t1 (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.011..59.463 rows=1000000 loops=1)
-> Hash (cost=14425.00..14425.00 rows=1000000 width=8) (actual time=178.212..178.212 rows=1000000 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 39063kB
-> Seq Scan on t2 (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.012..57.751 rows=1000000 loops=1)
Total runtime: 1255.734 ms
我尝试过的不成功的事情:
- 使用
i in (1,2)
或i=1 or i=2
代替=any
set enable_seqscan to f
用左连接和反连接模拟完整连接:
explain analyze
select * from
(select i, t1.j, t2.j from t1 left join t2 using(i)
union all
select i, null, j from t2
where not exists (select 1 from t1 where t1.i = t2.i)) sub
where i =any (array[1,2]);
Append (cost=0.85..51.61 rows=3 width=12) (actual time=0.007..0.018 rows=2 loops=1)
-> Nested Loop Left Join (cost=0.85..29.79 rows=2 width=12) (actual time=0.007..0.010 rows=2 loops=1)
-> Index Scan using t1_pkey on t1 (cost=0.42..12.88 rows=2 width=8) (actual time=0.003..0.005 rows=2 loops=1)
Index Cond: (i = ANY ('{1,2}'::integer[]))
-> Index Scan using t2_pkey on t2 (cost=0.42..8.44 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=2)
Index Cond: (t1.i = i)
-> Nested Loop Anti Join (cost=0.85..21.79 rows=1 width=8) (actual time=0.008..0.008 rows=0 loops=1)
-> Index Scan using t2_pkey on t2 t2_1 (cost=0.42..12.88 rows=2 width=8) (actual time=0.001..0.002 rows=2 loops=1)
Index Cond: (i = ANY ('{1,2}'::integer[]))
-> Index Only Scan using t1_pkey on t1 t1_1 (cost=0.42..4.44 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=2)
Index Cond: (i = t2_1.i)
Heap Fetches: 0
Total runtime: 0.065 ms
不过,这种方法会使我的实际查询复杂化并增加重复。有没有更好的方法让 Postgres 使用索引?
将谓词下推到子查询中就可以了:
EXPLAIN ANALYZE
SELECT *
FROM (SELECT * FROM t1 WHERE i = ANY ('{1,2}')) t1
FULL JOIN (SELECT * FROM t2 WHERE i = ANY ('{1,2}')) t2 USING (i);
QUERY PLAN Merge Full Join (cost=0.58..25.26 rows=2 width=16) (actual time=0.084..0.100 rows=2 loops=1) Merge Cond: (t1.i = t2.i) -> Index Scan using t1_pkey on t1 (cost=0.29..12.62 rows=2 width=8) (actual time=0.044..0.048 rows=2 loops=1) Index Cond: (i = ANY ('{1,2}'::integer[])) -> Index Scan using t2_pkey on t2 (cost=0.29..12.62 rows=2 width=8) (actual time=0.028..0.033 rows=2 loops=1) Index Cond: (i = ANY ('{1,2}'::integer[])) Total runtime: 0.256 ms
SQL Fiddle(10 万行)。
显然,查询规划器不够聪明,无法断定在 完全连接后列上的谓词可以使用基础表上的索引。这可以改进。
现在无法测试 pg 9.4。也许已经改进了。
顺便说一句,大多数客户端无法使用相同名称处理结果中的多个列(即使 Postgres 可以这样做)。 j
的两个实例将是一个问题,您必须至少使用一个列别名,迫使您明确列出列。