Postgres:在同一字段上使用 AND 条件进行内部连接
Postgres: Inner Join with AND condition on same field
一个Quiz
可以有很多Submissions
。我想获取所有 Quizzes
,其中至少有一个 Submission
与 submissions.correct = t
相关联,并且至少有一个 Submission
与 submissions.correct = f
相关联。
如何修复以下查询,尤其是 WHERE 语句以实现此目的:
SELECT quizzes.*,
Count(submissions.id) AS submissions_count
FROM "quizzes"
INNER JOIN "submissions"
ON "submissions"."quiz_id" = "quizzes"."id"
WHERE ( submissions.correct = 'f' )
AND ( submissions.correct = 't' )
GROUP BY quizzes.id
ORDER BY submissions_count ASC
更新:
这里是缺失的信息:
我需要测验中的所有行数据。我只需要在查询中进行排序的计数(首先是提交量最少的测验)。
k-voc_development=# \d quizzes;
Table "public.quizzes"
Column | Type | Modifiers
------------+-----------------------------+------------------------------------------------------
id | integer | not null default nextval('quizzes_id_seq'::regclass)
question | character varying | not null
created_at | timestamp without time zone | not null
updated_at | timestamp without time zone | not null
Indexes:
"quizzes_pkey" PRIMARY KEY, btree (id)
Referenced by:
TABLE "submissions" CONSTRAINT "fk_rails_04e433a811" FOREIGN KEY (quiz_id) REFERENCES quizzes(id)
TABLE "answers" CONSTRAINT "fk_rails_431b8a33a3" FOREIGN KEY (quiz_id) REFERENCES quizzes(id)
k-voc_development=# \d submissions;
Table "public.submissions"
Column | Type | Modifiers
------------+-----------------------------+----------------------------------------------------------
id | integer | not null default nextval('submissions_id_seq'::regclass)
quiz_id | integer | not null
correct | boolean | not null
created_at | timestamp without time zone | not null
updated_at | timestamp without time zone | not null
Indexes:
"submissions_pkey" PRIMARY KEY, btree (id)
"index_submissions_on_quiz_id" btree (quiz_id)
Foreign-key constraints:
"fk_rails_04e433a811" FOREIGN KEY (quiz_id) REFERENCES quizzes(id)
k-voc_development=#
如果除了 t 和 f 没有其他 submissions.correct 值,那么这将起作用:
SELECT quizzes.*,
Count(submissions.id) AS submissions_count
FROM "quizzes"
INNER JOIN "submissions"
ON "submissions"."quiz_id" = "quizzes"."id"
GROUP BY quizzes.id
HAVING COUNT(DISTINCT submissions.correct) >= 2
ORDER BY submissions_count ASC
将您的 where
子句移动到 Having
带有 Conditional Count
总计
的子句
SELECT quizzes.*,
Count(submissions.id) AS submissions_count
FROM "quizzes"
INNER JOIN "submissions"
ON "submissions"."quiz_id" = "quizzes"."id"
GROUP BY quizzes.id
HAVING Count(CASE WHEN submissions.correct = 'f' THEN 1 END) >= 1
and Count(CASE WHEN submissions.correct = 't' THEN 1 END) >= 1
ORDER BY submissions_count ASC
-- I want to fetch all Quizzes
SELECT * FROM quizzes q
WHERE EXISTS ( -- that have at least one associated Submission with submissions.correct = t
SELECT * FROM submissions s
WHERE s.quiz_id = q.id AND s.correct = 't'
)
AND EXISTS ( -- and at least one associated Submission with submissions.correct = f.
SELECT * FROM submissions s
WHERE s.quiz_id = q.id AND s.correct = 'f'
);
最佳解决方案取决于您的实施细节、数据分布和要求。
如果您有一个具有参照完整性(FK 约束)的典型安装并将 submissions.correct
定义为 boolean NOT NULL
并且只需要 quiz_id
连同提交总数,那么您根本不需要加入 quizzes
,这应该是最快的:
SELECT quiz_id, count(*) AS ct
FROM submissions
-- WHERE correct IS NOT NULL -- only relevant if correct can be NULL
GROUP BY 1
HAVING bool_or(correct)
AND bool_or(NOT correct);
专用的 aggregate function bool_or()
对于布尔值测试特别有用。比 CASE
表达式或类似结构更简单、更快。
还有许多其他技术,最佳解决方案取决于缺失的信息。
针对您的更新要求
I need all row data from quizzes
. I only need the count for ordering
within the query (the quizzes with the least amount of submissions first).
如果 很多 个测验合格(占总数的高百分比),这应该是最快的。
SELECT q.*
FROM (
SELECT quiz_id, count(*) AS ct
FROM submissions
GROUP BY 1
HAVING count(*) > count(correct OR NULL)
) s
JOIN quizzes q ON q.id = s.quiz_id
ORDER BY s.ct;
count(*) > count(correct OR NULL)
有效,因为 correct
是 boolean NOT NULL
。对于每个测验 很少 次提交,应该比上面的变体稍快。
一个Quiz
可以有很多Submissions
。我想获取所有 Quizzes
,其中至少有一个 Submission
与 submissions.correct = t
相关联,并且至少有一个 Submission
与 submissions.correct = f
相关联。
如何修复以下查询,尤其是 WHERE 语句以实现此目的:
SELECT quizzes.*,
Count(submissions.id) AS submissions_count
FROM "quizzes"
INNER JOIN "submissions"
ON "submissions"."quiz_id" = "quizzes"."id"
WHERE ( submissions.correct = 'f' )
AND ( submissions.correct = 't' )
GROUP BY quizzes.id
ORDER BY submissions_count ASC
更新:
这里是缺失的信息:
我需要测验中的所有行数据。我只需要在查询中进行排序的计数(首先是提交量最少的测验)。
k-voc_development=# \d quizzes;
Table "public.quizzes"
Column | Type | Modifiers
------------+-----------------------------+------------------------------------------------------
id | integer | not null default nextval('quizzes_id_seq'::regclass)
question | character varying | not null
created_at | timestamp without time zone | not null
updated_at | timestamp without time zone | not null
Indexes:
"quizzes_pkey" PRIMARY KEY, btree (id)
Referenced by:
TABLE "submissions" CONSTRAINT "fk_rails_04e433a811" FOREIGN KEY (quiz_id) REFERENCES quizzes(id)
TABLE "answers" CONSTRAINT "fk_rails_431b8a33a3" FOREIGN KEY (quiz_id) REFERENCES quizzes(id)
k-voc_development=# \d submissions;
Table "public.submissions"
Column | Type | Modifiers
------------+-----------------------------+----------------------------------------------------------
id | integer | not null default nextval('submissions_id_seq'::regclass)
quiz_id | integer | not null
correct | boolean | not null
created_at | timestamp without time zone | not null
updated_at | timestamp without time zone | not null
Indexes:
"submissions_pkey" PRIMARY KEY, btree (id)
"index_submissions_on_quiz_id" btree (quiz_id)
Foreign-key constraints:
"fk_rails_04e433a811" FOREIGN KEY (quiz_id) REFERENCES quizzes(id)
k-voc_development=#
如果除了 t 和 f 没有其他 submissions.correct 值,那么这将起作用:
SELECT quizzes.*,
Count(submissions.id) AS submissions_count
FROM "quizzes"
INNER JOIN "submissions"
ON "submissions"."quiz_id" = "quizzes"."id"
GROUP BY quizzes.id
HAVING COUNT(DISTINCT submissions.correct) >= 2
ORDER BY submissions_count ASC
将您的 where
子句移动到 Having
带有 Conditional Count
总计
SELECT quizzes.*,
Count(submissions.id) AS submissions_count
FROM "quizzes"
INNER JOIN "submissions"
ON "submissions"."quiz_id" = "quizzes"."id"
GROUP BY quizzes.id
HAVING Count(CASE WHEN submissions.correct = 'f' THEN 1 END) >= 1
and Count(CASE WHEN submissions.correct = 't' THEN 1 END) >= 1
ORDER BY submissions_count ASC
-- I want to fetch all Quizzes
SELECT * FROM quizzes q
WHERE EXISTS ( -- that have at least one associated Submission with submissions.correct = t
SELECT * FROM submissions s
WHERE s.quiz_id = q.id AND s.correct = 't'
)
AND EXISTS ( -- and at least one associated Submission with submissions.correct = f.
SELECT * FROM submissions s
WHERE s.quiz_id = q.id AND s.correct = 'f'
);
最佳解决方案取决于您的实施细节、数据分布和要求。
如果您有一个具有参照完整性(FK 约束)的典型安装并将 submissions.correct
定义为 boolean NOT NULL
并且只需要 quiz_id
连同提交总数,那么您根本不需要加入 quizzes
,这应该是最快的:
SELECT quiz_id, count(*) AS ct
FROM submissions
-- WHERE correct IS NOT NULL -- only relevant if correct can be NULL
GROUP BY 1
HAVING bool_or(correct)
AND bool_or(NOT correct);
专用的 aggregate function bool_or()
对于布尔值测试特别有用。比 CASE
表达式或类似结构更简单、更快。
还有许多其他技术,最佳解决方案取决于缺失的信息。
针对您的更新要求
I need all row data from
quizzes
. I only need the count for ordering within the query (the quizzes with the least amount of submissions first).
如果 很多 个测验合格(占总数的高百分比),这应该是最快的。
SELECT q.*
FROM (
SELECT quiz_id, count(*) AS ct
FROM submissions
GROUP BY 1
HAVING count(*) > count(correct OR NULL)
) s
JOIN quizzes q ON q.id = s.quiz_id
ORDER BY s.ct;
count(*) > count(correct OR NULL)
有效,因为 correct
是 boolean NOT NULL
。对于每个测验 很少 次提交,应该比上面的变体稍快。