Query plan difference inner join/right join "greatest-n-per-group", self joined, aggregate query
Query plan difference inner join/right join "greatest-n-per-group", self joined, aggregated query
对于一个小型 Postgres 10 数据仓库,我正在检查分析查询的改进并发现一个相当慢的查询,其中可能的改进基本上归结为这个子查询(经典的每组最大 n 问题):
SELECT s_postings.*
FROM dwh.s_postings
JOIN (SELECT s_postings.id,
max(s_postings.load_dts) AS load_dts
FROM dwh.s_postings
GROUP BY s_postings.id) AS current_postings
ON s_postings.id = current_postings.id AND s_postings.load_dts = current_postings.load_dts
执行计划如下:
"Gather (cost=23808.51..38602.59 rows=66 width=376) (actual time=1385.927..1810.844 rows=170847 loops=1)"
" Workers Planned: 2"
" Workers Launched: 2"
" -> Hash Join (cost=22808.51..37595.99 rows=28 width=376) (actual time=1199.647..1490.652 rows=56949 loops=3)"
" Hash Cond: (((s_postings.id)::text = (s_postings_1.id)::text) AND (s_postings.load_dts = (max(s_postings_1.load_dts))))"
" -> Parallel Seq Scan on s_postings (cost=0.00..14113.25 rows=128425 width=376) (actual time=0.016..73.604 rows=102723 loops=3)"
" -> Hash (cost=20513.00..20513.00 rows=153034 width=75) (actual time=1195.616..1195.616 rows=170847 loops=3)"
" Buckets: 262144 Batches: 1 Memory Usage: 20735kB"
" -> HashAggregate (cost=17452.32..18982.66 rows=153034 width=75) (actual time=836.694..1015.499 rows=170847 loops=3)"
" Group Key: s_postings_1.id"
" -> Seq Scan on s_postings s_postings_1 (cost=0.00..15911.21 rows=308221 width=75) (actual time=0.032..251.122 rows=308168 loops=3)"
"Planning time: 1.184 ms"
"Execution time: 1912.865 ms"
行估计绝对错误!对我来说奇怪的是,如果我现在将连接更改为右连接:
SELECT s_postings.*
FROM dwh.s_postings
RIGHT JOIN (SELECT s_postings.id,
max(s_postings.load_dts) AS load_dts
FROM dwh.s_postings
GROUP BY s_postings.id) AS current_postings
ON s_postings.id = current_postings.id AND s_postings.load_dts = current_postings.load_dts
执行计划:
"Hash Right Join (cost=22829.85..40375.62 rows=153177 width=376) (actual time=814.097..1399.673 rows=170848 loops=1)"
" Hash Cond: (((s_postings.id)::text = (s_postings_1.id)::text) AND (s_postings.load_dts = (max(s_postings_1.load_dts))))"
" -> Seq Scan on s_postings (cost=0.00..15926.10 rows=308510 width=376) (actual time=0.011..144.584 rows=308419 loops=1)"
" -> Hash (cost=20532.19..20532.19 rows=153177 width=75) (actual time=812.587..812.587 rows=170848 loops=1)"
" Buckets: 262144 Batches: 1 Memory Usage: 20735kB"
" -> HashAggregate (cost=17468.65..19000.42 rows=153177 width=75) (actual time=553.633..683.850 rows=170848 loops=1)"
" Group Key: s_postings_1.id"
" -> Seq Scan on s_postings s_postings_1 (cost=0.00..15926.10 rows=308510 width=75) (actual time=0.011..157.000 rows=308419 loops=1)"
"Planning time: 0.402 ms"
"Execution time: 1469.808 ms"
行估计更好!
我知道例如并行顺序扫描在某些情况下会降低性能,但它们不应该改变行估计!?
如果我没记错的话,聚合函数无论如何也会阻止索引的正确使用,并且也看不到额外的多变量统计的任何潜在收益,例如对于元组 id, load_dts
。数据库是 VACUUM ANALYZE
d.
对我来说,查询在逻辑上是相同的。
有没有办法支持查询规划器对估计做出更好的假设或改进查询?也许有人知道存在这种差异的原因?
编辑:以前的连接条件是 ON s_postings.id::text = current_postings.id::text
我将其更改为 ON s_postings.id = current_postings.id
以免混淆任何人。删除此转换不会更改查询计划。
Edit2:如下所示,greatest-n-per-group
问题有不同的解决方案:
SELECT p.*
FROM (SELECT p.*,
RANK() OVER (PARTITION BY p.id ORDER BY p.load_dts DESC) as seqnum
FROM dwh.s_postings p
) p
WHERE seqnum = 1;
一个非常好的解决方案,但遗憾的是查询规划器也低估了行数:
"Subquery Scan on p (cost=44151.67..54199.31 rows=1546 width=384) (actual time=1742.902..2594.359 rows=171269 loops=1)"
" Filter: (p.seqnum = 1)"
" Rows Removed by Filter: 137803"
" -> WindowAgg (cost=44151.67..50334.83 rows=309158 width=384) (actual time=1742.899..2408.240 rows=309072 loops=1)"
" -> Sort (cost=44151.67..44924.57 rows=309158 width=376) (actual time=1742.887..1927.325 rows=309072 loops=1)"
" Sort Key: p_1.id, p_1.load_dts DESC"
" Sort Method: quicksort Memory: 172275kB"
" -> Seq Scan on s_postings p_1 (cost=0.00..15959.58 rows=309158 width=376) (actual time=0.007..221.240 rows=309072 loops=1)"
"Planning time: 0.149 ms"
"Execution time: 2666.645 ms"
使用window个函数:
SELECT p.*
FROM (SELECT p.*,
RANK() OVER (PARTITION BY p.id ORDER BY p.load_dts DESC) as seqnum
FROM dwh.s_postings p
) p
WHERE seqnum = 1;
或者,更好的是,如果您想要每个 id
一行,请使用 DISTINCT ON
:
SELECT DISTINCT ON (p.id) p.*
FROM dwh.s_postings p
ORDER BY p.id, p.load_dts DESC;
如果我不得不推测,id
的转换——这是完全不必要的——会抛出优化器。使用 right join
很明显,所有行都保留在其中一个表中,这可能有助于统计计算。
时间上的差别不是很大。它很容易只是缓存效果。如果你在它们之间反复背靠背交替,你还能看出区别吗?如果通过设置 max_parallel_workers_per_gather = 0 来禁用并行执行,这是否使它们相等?
The row estimate is absolutely wrong!
虽然这显然是正确的,但我认为错误估计不会导致任何特别糟糕的事情发生。
I am aware that for example parallel sequential scans can in some conditions decrease performance but they should not change the row estimate!?
正确。正是 JOIN 类型的变化导致了估计的变化,进而导致了并行化的变化。由于 parallel_tuple_cost.
,认为它必须将更多的元组向上推送到领导者(而不是在工人中取消它们的资格)不鼓励并行计划
If I remember correctly aggregate functions also block the proper use of indexes
不,(id, load_dts)
甚至 (id)
上的索引应该可用于进行聚合,但由于您需要阅读整个 table,它可能会更慢读取整个索引和整个 table,而不是将整个 table 读入 HashAgg。您可以通过设置 enable_seqscan=off 来测试 PostgreSQL 是否认为它能够使用这样的索引。如果它仍然进行 seq 扫描,那么它认为该索引不可用。否则,它只是认为使用索引会适得其反。
Is there a way to support the query planner to make better assumptions about the estimates or improve the query? Maybe somebody knows a reason why this difference exists?
规划者缺乏洞察力,无法知道派生 table 中的每个 id,max(load_dts)
都必须来自原始 table 中的至少一行。相反,它将 ON 中的两个条件应用为独立变量,甚至不知道派生的 table 最常见的 values/histograms 是什么,因此无法预测重叠程度。但是使用 RIGHT JOIN,它知道派生的 table 中的每一行都得到 returned,无论是否在 "other" table 中找到匹配项。如果您从派生子查询创建临时 table 并分析它,然后在连接中使用 table,您应该得到更好的估计,因为它至少知道每列中的分布有多少重叠。但是那些更好的估计不太可能加载到更好的计划中,所以我不会为这种复杂性而烦恼。
您可以通过将其重写为 DISTINCT ON
查询来获得一些边际速度,但它不会神奇地更好。另请注意,这些并不等同。连接将 return 所有在给定 id 中排在第一位的行,而 DISTINCT ON 将 return 其中任意一个(除非您将列添加到 ORDER BY 以打破联系)
对于一个小型 Postgres 10 数据仓库,我正在检查分析查询的改进并发现一个相当慢的查询,其中可能的改进基本上归结为这个子查询(经典的每组最大 n 问题):
SELECT s_postings.*
FROM dwh.s_postings
JOIN (SELECT s_postings.id,
max(s_postings.load_dts) AS load_dts
FROM dwh.s_postings
GROUP BY s_postings.id) AS current_postings
ON s_postings.id = current_postings.id AND s_postings.load_dts = current_postings.load_dts
执行计划如下:
"Gather (cost=23808.51..38602.59 rows=66 width=376) (actual time=1385.927..1810.844 rows=170847 loops=1)"
" Workers Planned: 2"
" Workers Launched: 2"
" -> Hash Join (cost=22808.51..37595.99 rows=28 width=376) (actual time=1199.647..1490.652 rows=56949 loops=3)"
" Hash Cond: (((s_postings.id)::text = (s_postings_1.id)::text) AND (s_postings.load_dts = (max(s_postings_1.load_dts))))"
" -> Parallel Seq Scan on s_postings (cost=0.00..14113.25 rows=128425 width=376) (actual time=0.016..73.604 rows=102723 loops=3)"
" -> Hash (cost=20513.00..20513.00 rows=153034 width=75) (actual time=1195.616..1195.616 rows=170847 loops=3)"
" Buckets: 262144 Batches: 1 Memory Usage: 20735kB"
" -> HashAggregate (cost=17452.32..18982.66 rows=153034 width=75) (actual time=836.694..1015.499 rows=170847 loops=3)"
" Group Key: s_postings_1.id"
" -> Seq Scan on s_postings s_postings_1 (cost=0.00..15911.21 rows=308221 width=75) (actual time=0.032..251.122 rows=308168 loops=3)"
"Planning time: 1.184 ms"
"Execution time: 1912.865 ms"
行估计绝对错误!对我来说奇怪的是,如果我现在将连接更改为右连接:
SELECT s_postings.*
FROM dwh.s_postings
RIGHT JOIN (SELECT s_postings.id,
max(s_postings.load_dts) AS load_dts
FROM dwh.s_postings
GROUP BY s_postings.id) AS current_postings
ON s_postings.id = current_postings.id AND s_postings.load_dts = current_postings.load_dts
执行计划:
"Hash Right Join (cost=22829.85..40375.62 rows=153177 width=376) (actual time=814.097..1399.673 rows=170848 loops=1)"
" Hash Cond: (((s_postings.id)::text = (s_postings_1.id)::text) AND (s_postings.load_dts = (max(s_postings_1.load_dts))))"
" -> Seq Scan on s_postings (cost=0.00..15926.10 rows=308510 width=376) (actual time=0.011..144.584 rows=308419 loops=1)"
" -> Hash (cost=20532.19..20532.19 rows=153177 width=75) (actual time=812.587..812.587 rows=170848 loops=1)"
" Buckets: 262144 Batches: 1 Memory Usage: 20735kB"
" -> HashAggregate (cost=17468.65..19000.42 rows=153177 width=75) (actual time=553.633..683.850 rows=170848 loops=1)"
" Group Key: s_postings_1.id"
" -> Seq Scan on s_postings s_postings_1 (cost=0.00..15926.10 rows=308510 width=75) (actual time=0.011..157.000 rows=308419 loops=1)"
"Planning time: 0.402 ms"
"Execution time: 1469.808 ms"
行估计更好!
我知道例如并行顺序扫描在某些情况下会降低性能,但它们不应该改变行估计!?
如果我没记错的话,聚合函数无论如何也会阻止索引的正确使用,并且也看不到额外的多变量统计的任何潜在收益,例如对于元组 id, load_dts
。数据库是 VACUUM ANALYZE
d.
对我来说,查询在逻辑上是相同的。
有没有办法支持查询规划器对估计做出更好的假设或改进查询?也许有人知道存在这种差异的原因?
编辑:以前的连接条件是 ON s_postings.id::text = current_postings.id::text
我将其更改为 ON s_postings.id = current_postings.id
以免混淆任何人。删除此转换不会更改查询计划。
Edit2:如下所示,greatest-n-per-group
问题有不同的解决方案:
SELECT p.*
FROM (SELECT p.*,
RANK() OVER (PARTITION BY p.id ORDER BY p.load_dts DESC) as seqnum
FROM dwh.s_postings p
) p
WHERE seqnum = 1;
一个非常好的解决方案,但遗憾的是查询规划器也低估了行数:
"Subquery Scan on p (cost=44151.67..54199.31 rows=1546 width=384) (actual time=1742.902..2594.359 rows=171269 loops=1)"
" Filter: (p.seqnum = 1)"
" Rows Removed by Filter: 137803"
" -> WindowAgg (cost=44151.67..50334.83 rows=309158 width=384) (actual time=1742.899..2408.240 rows=309072 loops=1)"
" -> Sort (cost=44151.67..44924.57 rows=309158 width=376) (actual time=1742.887..1927.325 rows=309072 loops=1)"
" Sort Key: p_1.id, p_1.load_dts DESC"
" Sort Method: quicksort Memory: 172275kB"
" -> Seq Scan on s_postings p_1 (cost=0.00..15959.58 rows=309158 width=376) (actual time=0.007..221.240 rows=309072 loops=1)"
"Planning time: 0.149 ms"
"Execution time: 2666.645 ms"
使用window个函数:
SELECT p.*
FROM (SELECT p.*,
RANK() OVER (PARTITION BY p.id ORDER BY p.load_dts DESC) as seqnum
FROM dwh.s_postings p
) p
WHERE seqnum = 1;
或者,更好的是,如果您想要每个 id
一行,请使用 DISTINCT ON
:
SELECT DISTINCT ON (p.id) p.*
FROM dwh.s_postings p
ORDER BY p.id, p.load_dts DESC;
如果我不得不推测,id
的转换——这是完全不必要的——会抛出优化器。使用 right join
很明显,所有行都保留在其中一个表中,这可能有助于统计计算。
时间上的差别不是很大。它很容易只是缓存效果。如果你在它们之间反复背靠背交替,你还能看出区别吗?如果通过设置 max_parallel_workers_per_gather = 0 来禁用并行执行,这是否使它们相等?
The row estimate is absolutely wrong!
虽然这显然是正确的,但我认为错误估计不会导致任何特别糟糕的事情发生。
I am aware that for example parallel sequential scans can in some conditions decrease performance but they should not change the row estimate!?
正确。正是 JOIN 类型的变化导致了估计的变化,进而导致了并行化的变化。由于 parallel_tuple_cost.
,认为它必须将更多的元组向上推送到领导者(而不是在工人中取消它们的资格)不鼓励并行计划If I remember correctly aggregate functions also block the proper use of indexes
不,(id, load_dts)
甚至 (id)
上的索引应该可用于进行聚合,但由于您需要阅读整个 table,它可能会更慢读取整个索引和整个 table,而不是将整个 table 读入 HashAgg。您可以通过设置 enable_seqscan=off 来测试 PostgreSQL 是否认为它能够使用这样的索引。如果它仍然进行 seq 扫描,那么它认为该索引不可用。否则,它只是认为使用索引会适得其反。
Is there a way to support the query planner to make better assumptions about the estimates or improve the query? Maybe somebody knows a reason why this difference exists?
规划者缺乏洞察力,无法知道派生 table 中的每个 id,max(load_dts)
都必须来自原始 table 中的至少一行。相反,它将 ON 中的两个条件应用为独立变量,甚至不知道派生的 table 最常见的 values/histograms 是什么,因此无法预测重叠程度。但是使用 RIGHT JOIN,它知道派生的 table 中的每一行都得到 returned,无论是否在 "other" table 中找到匹配项。如果您从派生子查询创建临时 table 并分析它,然后在连接中使用 table,您应该得到更好的估计,因为它至少知道每列中的分布有多少重叠。但是那些更好的估计不太可能加载到更好的计划中,所以我不会为这种复杂性而烦恼。
您可以通过将其重写为 DISTINCT ON
查询来获得一些边际速度,但它不会神奇地更好。另请注意,这些并不等同。连接将 return 所有在给定 id 中排在第一位的行,而 DISTINCT ON 将 return 其中任意一个(除非您将列添加到 ORDER BY 以打破联系)