存在多个 JOIN LATERAL 时优化慢速查询
Optimize slow query when multiple JOIN LATERAL are present
这是对 的后续优化(尽管有一些优化和连接简化)。
我想知道是否可以优化以下需要 130322.2ms
完成的 PostgreSQL 13.1 查询。通常,如果只有一个 JOIN LATERAL
存在,它会在几毫秒内完成。
我最迷茫的是,每个 JOIN LATERAL
都有一个 ON
,其条件基于它自己的子查询的分数,我如何优化查询可能会减少JOIN LATERAL
仍然得到相同的结果。
据我所知,将条件 OR
添加到 JOIN LATERAL
而不是 AND
内的某些 WHERE 时,它似乎变慢了。参见:
SELECT count(*)
FROM subscriptions q
JOIN LATERAL (
SELECT
SUM(ts.score) AS sum_score
FROM
quiz_answers qa
JOIN answers a ON a.id = qa.answer_id
JOIN tag_scores ts ON ts.answer_id = a.id
WHERE
qa.quiz_id = q.quiz_id
AND ts.tag_id = 21
) AS q62958 ON q62958.sum_score <= 1
JOIN LATERAL (
SELECT
SUM(ts.score) AS sum_score
FROM
quiz_answers qa
JOIN answers a ON a.id = qa.answer_id
JOIN tag_scores ts ON ts.answer_id = a.id
WHERE
qa.quiz_id = q.quiz_id
OR ts.tag_id = 32
) AS q120342 ON q120342.sum_score <= 1
JOIN LATERAL (
SELECT
SUM(ts.score) AS sum_score
FROM
quiz_answers qa
JOIN answers a ON a.id = qa.answer_id
JOIN tag_scores ts ON ts.answer_id = a.id
WHERE
qa.quiz_id = q.quiz_id
OR ts.tag_id = 35
) AS q992506 ON q992506.sum_score <= 1
JOIN LATERAL (
SELECT
SUM(ts.score) AS sum_score
FROM
quiz_answers qa
JOIN answers a ON a.id = qa.answer_id
JOIN tag_scores ts ON ts.answer_id = a.id
WHERE
qa.quiz_id = q.quiz_id
OR ts.tag_id = 33
) AS q343255 ON q343255.sum_score <= 1
JOIN LATERAL (
SELECT
SUM(ts.score) AS sum_score
FROM
quiz_answers qa
JOIN answers a ON a.id = qa.answer_id
JOIN tag_scores ts ON ts.answer_id = a.id
WHERE
qa.quiz_id = q.quiz_id
OR ts.tag_id = 29
) AS q532052 ON q532052.sum_score <= 1
JOIN LATERAL (
SELECT
SUM(ts.score) AS sum_score
FROM
quiz_answers qa
JOIN answers a ON a.id = qa.answer_id
JOIN tag_scores ts ON ts.answer_id = a.id
WHERE
qa.quiz_id = q.quiz_id
OR ts.tag_id = 30
) AS q268437 ON q268437.sum_score <= 1
JOIN LATERAL (
SELECT
SUM(ts.score) AS sum_score
FROM
quiz_answers qa
JOIN answers a ON a.id = qa.answer_id
JOIN tag_scores ts ON ts.answer_id = a.id
WHERE
qa.quiz_id = q.quiz_id
AND ts.tag_id = 46
) AS q553964 ON q553964.sum_score >= 3
JOIN LATERAL (
SELECT
SUM(ts.score) AS sum_score
FROM
quiz_answers qa
JOIN answers a ON a.id = qa.answer_id
JOIN tag_scores ts ON ts.answer_id = a.id
WHERE
qa.quiz_id = q.quiz_id
AND ts.tag_id = 24
) AS q928243 ON q928243.sum_score >= 2
WHERE
q.state = 'subscribed' AND q.app_id = 4
;
subscriptions
table 少于 15000 行且少于 2000 行匹配 WHERE
子句。 q.state
和 q.app_id
都有索引。
完整的EXPLAIN ANALYZE
:https://explain.depesz.com/s/Ok0h
主要问题是查询错误:
WHERE
qa.quiz_id = q.quiz_id
OR ts.tag_id = 32
这里的OR
错位了,必须在之后聚合正确的行。此 WHERE
子句包括具有 或 匹配 quiz_id
或 tag_id = 32
的所有行。所以所有行,这是废话。
除此之外,您可以将多个 LATERAL
子查询与条件聚合合并,如下所示:
SELECT count(*)
FROM subscriptions q
JOIN LATERAL (
SELECT sum(ts.score) FILTER (WHERE ts.tag_id = 21) AS sum_score21
, sum(ts.score) FILTER (WHERE ts.tag_id = 32) AS sum_score32
, sum(ts.score) FILTER (WHERE ts.tag_id = 35) AS sum_score35
-- , more?
FROM quiz_answers qa
JOIN answers a ON a.id = qa.answer_id
JOIN tag_scores ts ON ts.answer_id = a.id
WHERE qa.quiz_id = q.quiz_id
AND ts.tag_id IN (21, 32, 35) -- more?
) AS t ON t.sum_score21 <= 1
OR t.sum_score32 <= 1
OR t.sum_score35 <= 1
-- AND / OR more?
WHERE q.state = 'subscribed'
AND q.app_id = 4;
使用 AND
或 OR
添加更多条件时请注意 operator precedence:您可能需要括号,因为 AND
在 OR
之前绑定。
关于聚合FILTER
:
- Aggregate columns with additional (distinct) filters
subscriptions(app_id, state, quizz_id)
上的多列索引可能会有所帮助(为您提供仅索引扫描)。但由于 table 没有那么大,所以这并不重要。
LATERAL
(而不是普通子查询)仍然有意义,而外部过滤器从 table subscriptions
中删除了大部分行。 tag_scores(answer_id, tag_id)
上的多列索引可能会有所帮助。
随着子查询中标签的增多 and/or 更多的订阅,LATERAL
变体和所述索引的有用性下降。
为了比较,这里有一个带有普通子查询的变体:
SELECT count(*)
FROM subscriptions q
JOIN (
SELECT qa.quiz_id
, sum(ts.score) FILTER (WHERE ts.tag_id = 21) AS sum_score21
, sum(ts.score) FILTER (WHERE ts.tag_id = 32) AS sum_score32
, sum(ts.score) FILTER (WHERE ts.tag_id = 35) AS sum_score35
-- , more?
FROM quiz_answers qa
JOIN answers a ON a.id = qa.answer_id
JOIN tag_scores ts ON ts.answer_id = a.id
WHERE ts.tag_id IN (21, 32, 35) -- more?
GROUP BY qa.quiz_id
) AS t USING (quiz_id)
WHERE q.state = 'subscribed'
AND q.app_id = 4
AND (t.sum_score21 <= 1
OR t.sum_score32 <= 1
OR t.sum_score35 <= 1)
-- AND / OR more?
;
无论哪种方式,t.sum_score21 <= 1
符合条件,如果...
- 至少有一行与标签 21 相关联
- 并且同一测验中所有标签为 21 的总分 <= 1
似乎是一个非常窄的过滤器。
降噪:
如果 answers
中的行永远不会丢失(使用 FK 约束强制执行引用完整性?),您可以在这里去掉中间人:
FROM quiz_answers qa
JOIN answers a ON a.id = qa.answer_id
JOIN tag_scores ts ON ts.answer_id = a.id
-->
FROM quiz_answers qa
JOIN tag_scores ts USING (answer_id)
这是对
我想知道是否可以优化以下需要 130322.2ms
完成的 PostgreSQL 13.1 查询。通常,如果只有一个 JOIN LATERAL
存在,它会在几毫秒内完成。
我最迷茫的是,每个 JOIN LATERAL
都有一个 ON
,其条件基于它自己的子查询的分数,我如何优化查询可能会减少JOIN LATERAL
仍然得到相同的结果。
据我所知,将条件 OR
添加到 JOIN LATERAL
而不是 AND
内的某些 WHERE 时,它似乎变慢了。参见:
SELECT count(*)
FROM subscriptions q
JOIN LATERAL (
SELECT
SUM(ts.score) AS sum_score
FROM
quiz_answers qa
JOIN answers a ON a.id = qa.answer_id
JOIN tag_scores ts ON ts.answer_id = a.id
WHERE
qa.quiz_id = q.quiz_id
AND ts.tag_id = 21
) AS q62958 ON q62958.sum_score <= 1
JOIN LATERAL (
SELECT
SUM(ts.score) AS sum_score
FROM
quiz_answers qa
JOIN answers a ON a.id = qa.answer_id
JOIN tag_scores ts ON ts.answer_id = a.id
WHERE
qa.quiz_id = q.quiz_id
OR ts.tag_id = 32
) AS q120342 ON q120342.sum_score <= 1
JOIN LATERAL (
SELECT
SUM(ts.score) AS sum_score
FROM
quiz_answers qa
JOIN answers a ON a.id = qa.answer_id
JOIN tag_scores ts ON ts.answer_id = a.id
WHERE
qa.quiz_id = q.quiz_id
OR ts.tag_id = 35
) AS q992506 ON q992506.sum_score <= 1
JOIN LATERAL (
SELECT
SUM(ts.score) AS sum_score
FROM
quiz_answers qa
JOIN answers a ON a.id = qa.answer_id
JOIN tag_scores ts ON ts.answer_id = a.id
WHERE
qa.quiz_id = q.quiz_id
OR ts.tag_id = 33
) AS q343255 ON q343255.sum_score <= 1
JOIN LATERAL (
SELECT
SUM(ts.score) AS sum_score
FROM
quiz_answers qa
JOIN answers a ON a.id = qa.answer_id
JOIN tag_scores ts ON ts.answer_id = a.id
WHERE
qa.quiz_id = q.quiz_id
OR ts.tag_id = 29
) AS q532052 ON q532052.sum_score <= 1
JOIN LATERAL (
SELECT
SUM(ts.score) AS sum_score
FROM
quiz_answers qa
JOIN answers a ON a.id = qa.answer_id
JOIN tag_scores ts ON ts.answer_id = a.id
WHERE
qa.quiz_id = q.quiz_id
OR ts.tag_id = 30
) AS q268437 ON q268437.sum_score <= 1
JOIN LATERAL (
SELECT
SUM(ts.score) AS sum_score
FROM
quiz_answers qa
JOIN answers a ON a.id = qa.answer_id
JOIN tag_scores ts ON ts.answer_id = a.id
WHERE
qa.quiz_id = q.quiz_id
AND ts.tag_id = 46
) AS q553964 ON q553964.sum_score >= 3
JOIN LATERAL (
SELECT
SUM(ts.score) AS sum_score
FROM
quiz_answers qa
JOIN answers a ON a.id = qa.answer_id
JOIN tag_scores ts ON ts.answer_id = a.id
WHERE
qa.quiz_id = q.quiz_id
AND ts.tag_id = 24
) AS q928243 ON q928243.sum_score >= 2
WHERE
q.state = 'subscribed' AND q.app_id = 4
;
subscriptions
table 少于 15000 行且少于 2000 行匹配 WHERE
子句。 q.state
和 q.app_id
都有索引。
完整的EXPLAIN ANALYZE
:https://explain.depesz.com/s/Ok0h
主要问题是查询错误:
WHERE
qa.quiz_id = q.quiz_id
OR ts.tag_id = 32
这里的OR
错位了,必须在之后聚合正确的行。此 WHERE
子句包括具有 或 匹配 quiz_id
或 tag_id = 32
的所有行。所以所有行,这是废话。
除此之外,您可以将多个 LATERAL
子查询与条件聚合合并,如下所示:
SELECT count(*)
FROM subscriptions q
JOIN LATERAL (
SELECT sum(ts.score) FILTER (WHERE ts.tag_id = 21) AS sum_score21
, sum(ts.score) FILTER (WHERE ts.tag_id = 32) AS sum_score32
, sum(ts.score) FILTER (WHERE ts.tag_id = 35) AS sum_score35
-- , more?
FROM quiz_answers qa
JOIN answers a ON a.id = qa.answer_id
JOIN tag_scores ts ON ts.answer_id = a.id
WHERE qa.quiz_id = q.quiz_id
AND ts.tag_id IN (21, 32, 35) -- more?
) AS t ON t.sum_score21 <= 1
OR t.sum_score32 <= 1
OR t.sum_score35 <= 1
-- AND / OR more?
WHERE q.state = 'subscribed'
AND q.app_id = 4;
使用 AND
或 OR
添加更多条件时请注意 operator precedence:您可能需要括号,因为 AND
在 OR
之前绑定。
关于聚合FILTER
:
- Aggregate columns with additional (distinct) filters
subscriptions(app_id, state, quizz_id)
上的多列索引可能会有所帮助(为您提供仅索引扫描)。但由于 table 没有那么大,所以这并不重要。
LATERAL
(而不是普通子查询)仍然有意义,而外部过滤器从 table subscriptions
中删除了大部分行。 tag_scores(answer_id, tag_id)
上的多列索引可能会有所帮助。
随着子查询中标签的增多 and/or 更多的订阅,LATERAL
变体和所述索引的有用性下降。
为了比较,这里有一个带有普通子查询的变体:
SELECT count(*)
FROM subscriptions q
JOIN (
SELECT qa.quiz_id
, sum(ts.score) FILTER (WHERE ts.tag_id = 21) AS sum_score21
, sum(ts.score) FILTER (WHERE ts.tag_id = 32) AS sum_score32
, sum(ts.score) FILTER (WHERE ts.tag_id = 35) AS sum_score35
-- , more?
FROM quiz_answers qa
JOIN answers a ON a.id = qa.answer_id
JOIN tag_scores ts ON ts.answer_id = a.id
WHERE ts.tag_id IN (21, 32, 35) -- more?
GROUP BY qa.quiz_id
) AS t USING (quiz_id)
WHERE q.state = 'subscribed'
AND q.app_id = 4
AND (t.sum_score21 <= 1
OR t.sum_score32 <= 1
OR t.sum_score35 <= 1)
-- AND / OR more?
;
无论哪种方式,t.sum_score21 <= 1
符合条件,如果...
- 至少有一行与标签 21 相关联
- 并且同一测验中所有标签为 21 的总分 <= 1
似乎是一个非常窄的过滤器。
降噪:
如果 answers
中的行永远不会丢失(使用 FK 约束强制执行引用完整性?),您可以在这里去掉中间人:
FROM quiz_answers qa
JOIN answers a ON a.id = qa.answer_id
JOIN tag_scores ts ON ts.answer_id = a.id
-->
FROM quiz_answers qa
JOIN tag_scores ts USING (answer_id)