存在多个 JOIN LATERAL 时优化慢速查询

Optimize slow query when multiple JOIN LATERAL are present

这是对 的后续优化(尽管有一些优化和连接简化)。

我想知道是否可以优化以下需要 130322.2ms 完成的 PostgreSQL 13.1 查询。通常,如果只有一个 JOIN LATERAL 存在,它会在几毫秒内完成。

我最迷茫的是,每个 JOIN LATERAL 都有一个 ON,其条件基于它自己的子查询的分数,我如何优化查询可能会减少JOIN LATERAL 仍然得到相同的结果。

据我所知,将条件 OR 添加到 JOIN LATERAL 而不是 AND 内的某些 WHERE 时,它似乎变慢了。参见:

SELECT count(*)
FROM subscriptions q
JOIN LATERAL (
      SELECT
        SUM(ts.score) AS sum_score
      FROM
        quiz_answers qa
        JOIN answers a ON a.id = qa.answer_id
        JOIN tag_scores ts ON ts.answer_id = a.id
      WHERE
        qa.quiz_id = q.quiz_id
        AND ts.tag_id = 21
    ) AS q62958 ON q62958.sum_score <= 1
JOIN LATERAL (
      SELECT
        SUM(ts.score) AS sum_score
      FROM
        quiz_answers qa
        JOIN answers a ON a.id = qa.answer_id
        JOIN tag_scores ts ON ts.answer_id = a.id
      WHERE
        qa.quiz_id = q.quiz_id
        OR ts.tag_id = 32
    ) AS q120342 ON q120342.sum_score <= 1
JOIN LATERAL (
      SELECT
        SUM(ts.score) AS sum_score
      FROM
        quiz_answers qa
        JOIN answers a ON a.id = qa.answer_id
        JOIN tag_scores ts ON ts.answer_id = a.id
      WHERE
        qa.quiz_id = q.quiz_id
        OR ts.tag_id = 35
    ) AS q992506 ON q992506.sum_score <= 1
JOIN LATERAL (
      SELECT
        SUM(ts.score) AS sum_score
      FROM
        quiz_answers qa
        JOIN answers a ON a.id = qa.answer_id
        JOIN tag_scores ts ON ts.answer_id = a.id
      WHERE
        qa.quiz_id = q.quiz_id
        OR ts.tag_id = 33
    ) AS q343255 ON q343255.sum_score <= 1
JOIN LATERAL (
      SELECT
        SUM(ts.score) AS sum_score
      FROM
        quiz_answers qa
        JOIN answers a ON a.id = qa.answer_id
        JOIN tag_scores ts ON ts.answer_id = a.id
      WHERE
        qa.quiz_id = q.quiz_id
        OR ts.tag_id = 29
    ) AS q532052 ON q532052.sum_score <= 1
JOIN LATERAL (
      SELECT
        SUM(ts.score) AS sum_score
      FROM
        quiz_answers qa
        JOIN answers a ON a.id = qa.answer_id
        JOIN tag_scores ts ON ts.answer_id = a.id
      WHERE
        qa.quiz_id = q.quiz_id
        OR ts.tag_id = 30
    ) AS q268437 ON q268437.sum_score <= 1
JOIN LATERAL (
      SELECT
        SUM(ts.score) AS sum_score
      FROM
        quiz_answers qa
        JOIN answers a ON a.id = qa.answer_id
        JOIN tag_scores ts ON ts.answer_id = a.id
      WHERE
        qa.quiz_id = q.quiz_id
        AND ts.tag_id = 46
    ) AS q553964 ON q553964.sum_score >= 3
JOIN LATERAL (
      SELECT
        SUM(ts.score) AS sum_score
      FROM
        quiz_answers qa
        JOIN answers a ON a.id = qa.answer_id
        JOIN tag_scores ts ON ts.answer_id = a.id
      WHERE
        qa.quiz_id = q.quiz_id
        AND ts.tag_id = 24
    ) AS q928243 ON q928243.sum_score >= 2
WHERE
  q.state = 'subscribed' AND q.app_id = 4
;

subscriptions table 少于 15000 行且少于 2000 行匹配 WHERE 子句。 q.stateq.app_id 都有索引。

完整的EXPLAIN ANALYZEhttps://explain.depesz.com/s/Ok0h

主要问题是查询错误:

 WHERE
            qa.quiz_id = q.quiz_id
            OR ts.tag_id = 32

这里的OR错位了,必须在之后聚合正确的行。此 WHERE 子句包括具有 匹配 quiz_id tag_id = 32 的所有行。所以所有行,这是废话。

除此之外,您可以将多个 LATERAL 子查询与条件聚合合并,如下所示:

SELECT count(*)
FROM   subscriptions q
JOIN   LATERAL (
   SELECT sum(ts.score) FILTER (WHERE ts.tag_id = 21) AS sum_score21
        , sum(ts.score) FILTER (WHERE ts.tag_id = 32) AS sum_score32
        , sum(ts.score) FILTER (WHERE ts.tag_id = 35) AS sum_score35
     -- , more?
   FROM   quiz_answers qa
   JOIN   answers      a  ON a.id = qa.answer_id
   JOIN   tag_scores   ts ON ts.answer_id = a.id
   WHERE  qa.quiz_id = q.quiz_id
   AND    ts.tag_id IN (21, 32, 35) -- more?
   ) AS t ON t.sum_score21 <= 1
          OR t.sum_score32 <= 1
          OR t.sum_score35 <= 1
       -- AND / OR more?
WHERE  q.state = 'subscribed'
AND    q.app_id = 4;

使用 ANDOR 添加更多条件时请注意 operator precedence:您可能需要括号,因为 ANDOR 之前绑定。

关于聚合FILTER:

  • Aggregate columns with additional (distinct) filters

subscriptions(app_id, state, quizz_id) 上的多列索引可能会有所帮助(为您提供仅索引扫描)。但由于 table 没有那么大,所以这并不重要。

LATERAL(而不是普通子查询)仍然有意义,而外部过滤器从 table subscriptions 中删除了大部分行。 tag_scores(answer_id, tag_id) 上的多列索引可能会有所帮助。

随着子查询中标签的增多 and/or 更多的订阅,LATERAL 变体和所述索引的有用性下降。

为了比较,这里有一个带有普通子查询的变体:

SELECT count(*)
FROM   subscriptions q
JOIN  (
   SELECT qa.quiz_id
        , sum(ts.score) FILTER (WHERE ts.tag_id = 21) AS sum_score21
        , sum(ts.score) FILTER (WHERE ts.tag_id = 32) AS sum_score32
        , sum(ts.score) FILTER (WHERE ts.tag_id = 35) AS sum_score35
     -- , more?
   FROM   quiz_answers qa
   JOIN   answers      a  ON a.id = qa.answer_id
   JOIN   tag_scores   ts ON ts.answer_id = a.id
   WHERE  ts.tag_id IN (21, 32, 35) -- more?
   GROUP  BY qa.quiz_id
   ) AS t USING (quiz_id)
WHERE  q.state = 'subscribed'
AND    q.app_id = 4
AND   (t.sum_score21 <= 1
    OR t.sum_score32 <= 1
    OR t.sum_score35 <= 1)
    -- AND / OR more?
;

无论哪种方式,t.sum_score21 <= 1 符合条件,如果...

  • 至少有一行与标签 21 相关联
  • 并且同一测验中所有标签为 21 的总分 <= 1

似乎是一个非常窄的过滤器。

降噪:

如果 answers 中的行永远不会丢失(使用 FK 约束强制执行引用完整性?),您可以在这里去掉中间人:

   FROM   quiz_answers qa
   JOIN   answers      a  ON a.id = qa.answer_id
   JOIN   tag_scores   ts ON ts.answer_id = a.id

-->

   FROM   quiz_answers qa
   JOIN   tag_scores   ts USING (answer_id)