Postgres:在同一字段上使用 AND 条件进行内部连接

Postgres: Inner Join with AND condition on same field

一个Quiz可以有很多Submissions。我想获取所有 Quizzes,其中至少有一个 Submissionsubmissions.correct = t 相关联,并且至少有一个 Submissionsubmissions.correct = f 相关联。

如何修复以下查询,尤其是 WHERE 语句以实现此目的:

SELECT quizzes.*,
       Count(submissions.id) AS submissions_count
FROM   "quizzes"
       INNER JOIN "submissions"
               ON "submissions"."quiz_id" = "quizzes"."id"
WHERE  ( submissions.correct = 'f' )
       AND ( submissions.correct = 't' )
GROUP  BY quizzes.id
ORDER  BY submissions_count ASC

更新:

这里是缺失的信息:

我需要测验中的所有行数据。我只需要在查询中进行排序的计数(首先是提交量最少的测验)。

k-voc_development=# \d quizzes;
                                         Table "public.quizzes"
       Column   |            Type             |                      Modifiers                       
    ------------+-----------------------------+------------------------------------------------------
     id         | integer                     | not null default nextval('quizzes_id_seq'::regclass)
     question   | character varying           | not null
     created_at | timestamp without time zone | not null
     updated_at | timestamp without time zone | not null
    Indexes:
        "quizzes_pkey" PRIMARY KEY, btree (id)
    Referenced by:
        TABLE "submissions" CONSTRAINT "fk_rails_04e433a811" FOREIGN KEY (quiz_id) REFERENCES quizzes(id)
        TABLE "answers" CONSTRAINT "fk_rails_431b8a33a3" FOREIGN KEY (quiz_id) REFERENCES quizzes(id)

    k-voc_development=# \d submissions;
                                         Table "public.submissions"
       Column   |            Type             |                        Modifiers                         
    ------------+-----------------------------+----------------------------------------------------------
     id         | integer                     | not null default nextval('submissions_id_seq'::regclass)
     quiz_id    | integer                     | not null
     correct    | boolean                     | not null
     created_at | timestamp without time zone | not null
     updated_at | timestamp without time zone | not null
    Indexes:
        "submissions_pkey" PRIMARY KEY, btree (id)
        "index_submissions_on_quiz_id" btree (quiz_id)
    Foreign-key constraints:
        "fk_rails_04e433a811" FOREIGN KEY (quiz_id) REFERENCES quizzes(id)

    k-voc_development=# 

如果除了 t 和 f 没有其他 submissions.correct 值,那么这将起作用:

SELECT quizzes.*,
       Count(submissions.id) AS submissions_count
FROM   "quizzes"
       INNER JOIN "submissions"
               ON "submissions"."quiz_id" = "quizzes"."id"
GROUP  BY quizzes.id
HAVING COUNT(DISTINCT submissions.correct) >= 2
ORDER  BY submissions_count ASC 

将您的 where 子句移动到 Having 带有 Conditional Count 总计

的子句
SELECT quizzes.*,
       Count(submissions.id) AS submissions_count
FROM   "quizzes"
       INNER JOIN "submissions"
               ON "submissions"."quiz_id" = "quizzes"."id"
GROUP  BY quizzes.id
HAVING Count(CASE WHEN submissions.correct = 'f' THEN 1 END) >= 1
        and Count(CASE WHEN submissions.correct = 't' THEN 1 END) >= 1
ORDER  BY submissions_count ASC 
-- I want to fetch all Quizzes
SELECT * FROM quizzes q
WHERE EXISTS ( -- that have at least one associated Submission with submissions.correct = t
    SELECT * FROM submissions s
    WHERE s.quiz_id = q.id AND s.correct = 't'
    )
AND EXISTS ( -- and at least one associated Submission with submissions.correct = f.
    SELECT * FROM submissions s
    WHERE s.quiz_id = q.id AND s.correct = 'f'
    );

最佳解决方案取决于您的实施细节、数据分布和要求。

如果您有一个具有参照完整性(FK 约束)的典型安装并将 submissions.correct 定义为 boolean NOT NULL 并且只需要 quiz_id连同提交总数,那么您根本不需要加入 quizzes,这应该是最快的:

SELECT quiz_id, count(*) AS ct
FROM   submissions
-- WHERE  correct IS NOT NULL -- only relevant if correct can be NULL
GROUP  BY 1
HAVING bool_or(correct)
AND    bool_or(NOT correct);

专用的 aggregate function bool_or() 对于布尔值测试特别有用。比 CASE 表达式或类似结构更简单、更快。

还有许多其他技术,最佳解决方案取决于缺失的信息。

针对您的更新要求

I need all row data from quizzes. I only need the count for ordering within the query (the quizzes with the least amount of submissions first).

如果 很多 个测验合格(占总数的高百分比),这应该是最快的。

SELECT q.*
FROM  (
   SELECT quiz_id, count(*) AS ct
   FROM   submissions
   GROUP  BY 1
   HAVING count(*) > count(correct OR NULL)
   ) s
JOIN   quizzes q ON q.id = s.quiz_id
ORDER  BY s.ct;

count(*) > count(correct OR NULL) 有效,因为 correctboolean NOT NULL。对于每个测验 很少 次提交,应该比上面的变体稍快。