返回 postgres 中的独特项目

Returning unique items in postgres

我想 return 独特的问题,但如果我有一个问题有多个答案,即使有 DISTINCT,所有相同问题的答案都会得到 return 我只想要这个问题return一次这里是我的sql

SELECT 
DISTINCT questions.id AS question_id,
questions.title AS question_title,
questions.created_at AS questionCreatedAt,
questions.updated_at AS question_updated_at,
answers.id AS answer_id,
answers.content AS answer_content,
answers.created_at AS answer_created_at,
answers.updated_at AS answer_updated_at,
(SELECT SUM(votes.value) AS votes FROM votes WHERE answers.id =votes.answer_id)
FROM questions
LEFT JOIN answers ON questions.id = answers.question_id
LEFT JOIN votes ON answers.id = votes.answer_id; 

您应该使用“DISTINCT ON”而不是“DISTINCT”。

SELECT 
DISTINCT ON (questions.id) questions.id,
questions.title AS question_title,
questions.created_at AS questionCreatedAt,
questions.updated_at AS question_updated_at,
answers.id AS answer_id,
answers.content AS answer_content,
answers.created_at AS answer_created_at,
answers.updated_at AS answer_updated_at,
(SELECT SUM(votes.value) AS votes FROM votes WHERE answers.id =votes.answer_id)
FROM questions
LEFT JOIN answers ON questions.id = answers.question_id
LEFT JOIN votes ON answers.id = votes.answer_id; 

Similar question

Guide

对于每个组中的特定行 return,您需要添加一个确定性的 ORDER BY 子句。基本上:

SELECT DISTINCT ON (q.id)
       q.id AS question_id
     , q.title AS question_title
     , q.created_at AS question_created_at
     , q.updated_at AS question_updated_at
     , a.id AS answer_id
     , a.content AS answer_content
     , a.created_at AS answer_created_at
     , a.updated_at AS answer_updated_at
     , SUM(v.value) AS votes
FROM   questions q
LEFT   JOIN answers a ON q.id = a.question_id
LEFT   JOIN votes   v ON a.id = v.answer_id
GROUP  BY q.id, a.id   -- for the sum
ORDER  BY q.id, a.created_at DESC NULLS LAST, a.id;

第一个 ORDER BY 项必须符合 DISTINCT ON 条款。
您想要“最新”答案,所以 a.created_at DESC 是下一个。
NULLS LAST 因为该列可能可以为空(您没有透露)。
如果 a.created_at.

有多个答案并列,最后的 a.id 仅用于决胜局

详细解释:

  • Select first row in each GROUP BY group?

已经加入votes后,不需要投票总数的相关子查询:

(SELECT SUM(votes.value) AS votes FROM votes WHERE answers.id =votes.answer_id)

目前,您可能得到不正确的(相乘的)总和。假设 answersvotes 之间存在一对多关系(否则,投票计数可以作为另一列添加到 answers),它是 或者:加入 table,然后加入 GROUP BY,或者不加入 table 并添加相关子查询。

我用一个简单的 sum() 修复了它,保持连接,假设 q.ida.id 是它们各自的 主键 tables(你没有透露 table 的定义)。这是可能的,因为 DISTINCT ONGROUP BY 之后应用。参见:

  • Best way to get result count before LIMIT was applied

或参阅下文以获得可能更好的解决方案。

当您return所有或大部分问题时,如果您加入[,查询通常更快after 获取每个问题的最新答案。喜欢:

SELECT q.id AS question_id
     , q.title AS question_title
     , q.created_at AS question_created_at
     , q.updated_at AS question_updated_at
     , a.id AS answer_id
     , a.content AS answer_content
     , a.created_at AS answer_created_at
     , a.updated_at AS answer_updated_at
     , u.user_name                -- whatever you need from users table
     , (SELECT SUM(value) FROM votes v WHERE v.answer_id = a.answer_id) AS votes
FROM   questions q
LEFT   JOIN (
   SELECT DISTINCT ON (a.question_id)
          a.question_id AS id
        , a.id AS answer_id
        , a.content AS answer_content
        , a.created_at AS answer_created_at
        , a.updated_at AS answer_updated_at
        , a.user_id
   FROM   answers    a
   ORDER  BY a.question_id, a.created_at DESC NULLS LAST, a.id
   ) a USING (id)
LEFT JOIN users u ON u.id = a.user_id

在这里,我保留了 votes 的相关子查询,因为在之后 减少选择的答案而不是计算所有答案通常更便宜。

类似于 users(添加到您的答案中):加入 after 减少到所选答案。并将 users 中的内容从 SELECT 列表中放入实际 return 中。

如果您的 table answers 很大,answer(question_id, created_at DESC NULLS LAST) 上的多列索引将是性能的理想选择。
如果每个问题有很多答案,则不同的查询技术可能会更快。参见:

  • Optimize GROUP BY query to retrieve latest row per user

对于检索所有问题的一小部分,LATERAL 或相关子查询通常更快。

详情取决于未公开的 table 定义和基数。

已成功获取最新答案

`SELECT
            DISTINCT ON (questions.id) questions.id,
            questions.title AS question_title,
            questions.created_at AS questionCreatedAt,
            questions.updated_at AS question_updated_at,
            answers.id AS answer_id,
            answers.content AS answer_content,
            answers.created_at AS answer_created_at,
            answers.updated_at AS answer_updated_at,
            (SELECT SUM(votes.value) AS votes FROM votes WHERE answers.id =votes.answer_id)
            FROM questions
            LEFT JOIN answers ON questions.id = answers.question_id
            LEFT JOIN users ON  users.id  = answers.user_id
            LEFT JOIN votes ON answers.id = votes.answer_id
            ORDER  BY questions.id, answers.created_at DESC NULLS LAST, answers.id;

select distinctrow operator 这意味着它检查每个选定的列并考虑该行是否与其他每一行不同。如果当前行由于任何选定的列而不同于任何其他行,它将被返回。在您的情况下,您已将问题与答案结合起来,我假设一旦您在调查中有多个人,那么您可能会得到不同的答案,因此这些差异会导致更多行。如果您只需要不同的问题,那么不要加入会使结果成倍增加的表格。

但是在您的查询中您似乎也想要一个 SUM() 所以而不是追求使用 select distinct 或许您可以考虑使用 group by ,像这样:

SELECT
  questions.id AS question_id,
  questions.title AS question_title,
  questions.created_at AS questionCreatedAt,
  questions.updated_at AS question_updated_at,
  COUNT(DISTINCT answers.id) and num_answers,
  MIN(answers.created_at) AS answer_created_at,
  MAX(answers.updated_at) AS answer_updated_at,
  SUM(votes.value) AS votes
FROM questions
LEFT JOIN answers
  ON questions.id = answers.question_id
LEFT JOIN votes
  ON answers.id = votes.answer_id
GROUP BY
  questions.id,
  questions.title,
  questions.created_at,
  questions.updated_at

在 Postgres 中,有一个额外的限定符用于区分 select distinct on (...) 但要控制结果,它应该与 order by 子句结合使用

The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. (ref)

因此这将减少返回的行数,并且可以使用 order by 子句

控制输出最近创建的答案
SELECT DISTINCT ON (questions.id)
 questions.id,
questions.title AS question_title,
questions.created_at AS questionCreatedAt,
questions.updated_at AS question_updated_at,
answers.id AS answer_id,
answers.content AS answer_content,
answers.created_at AS answer_created_at,
answers.updated_at AS answer_updated_at,
(SELECT SUM(votes.value) AS votes FROM votes WHERE answers.id =votes.answer_id)
FROM questions
LEFT JOIN answers ON questions.id = answers.question_id
LEFT JOIN votes ON answers.id = votes.answer_id; 
ORDER BY questions.id, answers.created_at DESC

查找“最新行”的更通用(非 Postgres 特定)解决方案是使用 row_number() over()。此外,不是通过“相关子查询”对投票进行汇总,而是首先按答案汇总投票,然后加入其余表,如下所示:

SELECT
  questions.id AS question_id,
  questions.title AS question_title,
  questions.created_at AS questionCreatedAt,
  questions.updated_at AS question_updated_at
  answers.id AS answer_id,
  answers.content AS answer_content,
  answers.created_at AS answer_created_at,
  answers.updated_at AS answer_updated_at,
  votes.votes
FROM questions
LEFT JOIN ( SELECT
            a.*,
            ROW_NUMBER() OVER (PARTITION BY a.id ORDER BY created_at DESC) AS rn
            FROM answers AS a
  ) AS answers
  ON questions.id = answers.question_id
  AND answers.rn = 1
LEFT JOIN (
            SELECT v.answer_id, SUM(v.value) as votes
            FROM votes as v
            GROUP BY v.answer_id
  ) AS votes
  ON answers.id = votes.answer_id