为什么 SQL 条件不在 psql 解释计划中?
Why and how SQL condition not in psql explain plan?
我尝试分析连接查询基准的计划 https://github.com/gregrahn/join-order-benchmark
比如我执行以下命令:
EXPLAIN SELECT *
FROM aka_name AS an,
cast_info AS ci,
company_name AS cn,
keyword AS k,
movie_companies AS mc,
movie_keyword AS mk,
name AS n,
title AS t
WHERE an.person_id = n.id
AND n.id = ci.person_id
AND ci.movie_id = t.id
AND t.id = mk.movie_id
AND mk.keyword_id = k.id
AND t.id = mc.movie_id
AND mc.company_id = cn.id
AND an.person_id = ci.person_id
AND ci.movie_id = mc.movie_id
AND ci.movie_id = mk.movie_id
AND mc.movie_id = mk.movie_id;
结果,我得到了以下查询计划
QUERY PLAN [0/1803]
------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=1973375.70..22192463.47 rows=22337517790 width=449)
Hash Cond: (ci.movie_id = t.id)
-> Merge Join (cost=102.03..2617413.84 rows=88800840 width=203)
Merge Cond: (n.id = an.person_id)
-> Merge Join (cost=0.87..2341713.60 rows=36244344 width=130)
Merge Cond: (ci.person_id = n.id)
-> Index Scan using person_id_cast_info on cast_info ci (cost=0.44..1714393.60 rows=36244344 width=56)
-> Index Scan using name_pkey on name n (cost=0.43..163847.25 rows=4167379 width=74)
-> Materialize (cost=0.42..69770.80 rows=901343 width=73)
-> Index Scan using person_id_aka_name on aka_name an (cost=0.42..67517.44 rows=901343 width=73)
-> Hash (cost=834975.33..834975.33 rows=24906348 width=246)
-> Hash Join (cost=486218.85..834975.33 rows=24906348 width=246)
Hash Cond: (mk.movie_id = t.id)
-> Hash Join (cost=4885.82..131552.82 rows=4523930 width=37)
Hash Cond: (mk.keyword_id = k.id)
-> Seq Scan on movie_keyword mk (cost=0.00..69693.30 rows=4523930 width=12)
-> Hash (cost=2290.70..2290.70 rows=134170 width=25)
-> Seq Scan on keyword k (cost=0.00..2290.70 rows=134170 width=25)
-> Hash (cost=372278.91..372278.91 rows=2609129 width=209)
-> Hash Join (cost=141184.56..372278.91 rows=2609129 width=209)
Hash Cond: (mc.movie_id = t.id)
-> Hash Join (cost=11266.43..106748.81 rows=2609129 width=115)
Hash Cond: (mc.company_id = cn.id)
-> Seq Scan on movie_companies mc (cost=0.00..44881.29 rows=2609129 width=40)
-> Hash (cost=5344.97..5344.97 rows=234997 width=75)
-> Seq Scan on company_name cn (cost=0.00..5344.97 rows=234997 width=75)
-> Hash (cost=61280.28..61280.28 rows=2528228 width=94)
-> Seq Scan on title t (cost=0.00..61280.28 rows=2528228 width=94)
JIT:
如您所见,此计划中不存在条件 mc.movie_id = mk.movie_id
。如何以及为什么可能?
查看最后3个条件:
AND ci.movie_id = mc.movie_id
AND ci.movie_id = mk.movie_id
AND mc.movie_id = mk.movie_id;
使用 movie_id
,您将 table ci
与 mc
匹配,然后 ci
与 mk
匹配,因此意味着 mc
匹配 mk
,因此最后一个条件是多余的,规划器理所当然地忽略了它。
JGH 回答了这个问题。
这不是答案,但这是编写连接的正确方法(INNER 不是绝对必要的,但我更喜欢显式)。 FROM table1,table2 WHERE ... 是一个坏习惯,您应该立即改掉。该语法不是很灵活,即使是简单的查询也几乎无法阅读。
SELECT *
FROM aka_name AS an
INNER
JOIN NAME AS n
ON an.person_id = n.id
INNER
JOIN cast_info AS ci
ON an.person_id = ci.person_id
AND n.id = ci.person_id
INNER
JOIN title AS T
ON ci.movie_id = t.id
INNER
JOIN movie_keyword AS mk
ON t.id = mk.movie_id
AND ci.movie_id = mk.movie_id
INNER
JOIN movie_companies AS mc
ON t.id = movie_companies.movie_id
AND ci.movie_id = mc.movie_id
AND mc.movie_id = mk.movie_id
INNER
JOIN keyword AS K
ON mk.keyword_id = k.id
INNER
JOIN company_name cn
ON mc.mc.company_id = cn.id;
我尝试分析连接查询基准的计划 https://github.com/gregrahn/join-order-benchmark
比如我执行以下命令:
EXPLAIN SELECT *
FROM aka_name AS an,
cast_info AS ci,
company_name AS cn,
keyword AS k,
movie_companies AS mc,
movie_keyword AS mk,
name AS n,
title AS t
WHERE an.person_id = n.id
AND n.id = ci.person_id
AND ci.movie_id = t.id
AND t.id = mk.movie_id
AND mk.keyword_id = k.id
AND t.id = mc.movie_id
AND mc.company_id = cn.id
AND an.person_id = ci.person_id
AND ci.movie_id = mc.movie_id
AND ci.movie_id = mk.movie_id
AND mc.movie_id = mk.movie_id;
结果,我得到了以下查询计划
QUERY PLAN [0/1803]
------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=1973375.70..22192463.47 rows=22337517790 width=449)
Hash Cond: (ci.movie_id = t.id)
-> Merge Join (cost=102.03..2617413.84 rows=88800840 width=203)
Merge Cond: (n.id = an.person_id)
-> Merge Join (cost=0.87..2341713.60 rows=36244344 width=130)
Merge Cond: (ci.person_id = n.id)
-> Index Scan using person_id_cast_info on cast_info ci (cost=0.44..1714393.60 rows=36244344 width=56)
-> Index Scan using name_pkey on name n (cost=0.43..163847.25 rows=4167379 width=74)
-> Materialize (cost=0.42..69770.80 rows=901343 width=73)
-> Index Scan using person_id_aka_name on aka_name an (cost=0.42..67517.44 rows=901343 width=73)
-> Hash (cost=834975.33..834975.33 rows=24906348 width=246)
-> Hash Join (cost=486218.85..834975.33 rows=24906348 width=246)
Hash Cond: (mk.movie_id = t.id)
-> Hash Join (cost=4885.82..131552.82 rows=4523930 width=37)
Hash Cond: (mk.keyword_id = k.id)
-> Seq Scan on movie_keyword mk (cost=0.00..69693.30 rows=4523930 width=12)
-> Hash (cost=2290.70..2290.70 rows=134170 width=25)
-> Seq Scan on keyword k (cost=0.00..2290.70 rows=134170 width=25)
-> Hash (cost=372278.91..372278.91 rows=2609129 width=209)
-> Hash Join (cost=141184.56..372278.91 rows=2609129 width=209)
Hash Cond: (mc.movie_id = t.id)
-> Hash Join (cost=11266.43..106748.81 rows=2609129 width=115)
Hash Cond: (mc.company_id = cn.id)
-> Seq Scan on movie_companies mc (cost=0.00..44881.29 rows=2609129 width=40)
-> Hash (cost=5344.97..5344.97 rows=234997 width=75)
-> Seq Scan on company_name cn (cost=0.00..5344.97 rows=234997 width=75)
-> Hash (cost=61280.28..61280.28 rows=2528228 width=94)
-> Seq Scan on title t (cost=0.00..61280.28 rows=2528228 width=94)
JIT:
如您所见,此计划中不存在条件 mc.movie_id = mk.movie_id
。如何以及为什么可能?
查看最后3个条件:
AND ci.movie_id = mc.movie_id
AND ci.movie_id = mk.movie_id
AND mc.movie_id = mk.movie_id;
使用 movie_id
,您将 table ci
与 mc
匹配,然后 ci
与 mk
匹配,因此意味着 mc
匹配 mk
,因此最后一个条件是多余的,规划器理所当然地忽略了它。
JGH 回答了这个问题。
这不是答案,但这是编写连接的正确方法(INNER 不是绝对必要的,但我更喜欢显式)。 FROM table1,table2 WHERE ... 是一个坏习惯,您应该立即改掉。该语法不是很灵活,即使是简单的查询也几乎无法阅读。
SELECT *
FROM aka_name AS an
INNER
JOIN NAME AS n
ON an.person_id = n.id
INNER
JOIN cast_info AS ci
ON an.person_id = ci.person_id
AND n.id = ci.person_id
INNER
JOIN title AS T
ON ci.movie_id = t.id
INNER
JOIN movie_keyword AS mk
ON t.id = mk.movie_id
AND ci.movie_id = mk.movie_id
INNER
JOIN movie_companies AS mc
ON t.id = movie_companies.movie_id
AND ci.movie_id = mc.movie_id
AND mc.movie_id = mk.movie_id
INNER
JOIN keyword AS K
ON mk.keyword_id = k.id
INNER
JOIN company_name cn
ON mc.mc.company_id = cn.id;