使用 LEFT JOIN 和子查询提高 PostgreSQL 查询的性能
Improve performance of PostgreSQL query with LEFT JOIN and subquery
使用 9.4.20 版本。
我很难尝试优化查询。我认为在查看 EXPLAIN
之后,问题出在第二个左联接的子查询中。请注意,我决定进行子查询,因为我有多个可被视为响应的来源,因此我试图通过(必要时)执行 UNION ALL
s 来在其中拥有一种同质数据源。为了问题的简单性,我没有做任何 UNION ALL
s 因为那不是我面临的主要问题。以下是事实:
查询
SELECT labels."labelID",
COUNT(DISTINCT "demographics"."user_id") as "memberCount",
AVG("t"."value") FILTER(WHERE "t"."uuid" = '6d5f647f-6251-4ded-bb9f-9523debf2bd5' AND "t"."type" = 'Rating') AS "6d5f647f-6251-4ded-bb9f-9523debf2bd5__RA",
AVG("t"."value") FILTER(WHERE "t"."uuid" = '40ff1921-9d62-4183-9de8-36d845d47494' AND "t"."type" = 'Rating') AS "40ff1921-9d62-4183-9de8-36d845d47494__RA"
FROM (
SELECT DISTINCT ("user_info"->>'gender') as "labelID",
("user_info"->>'gender') as "labelName"
FROM "user_demographics"
WHERE "iteration_id" = 2 AND "company_id" = 1
LIMIT 8
) as "labels"
LEFT JOIN "user_demographics" as "demographics" on "labels"."labelID" IS NOT DISTINCT FROM ("demographics"."user_info"->>'gender')::text and "demographics"."iteration_id" = 2
LEFT JOIN (
SELECT questions.uuid::uuid,
"responses"."id" as "response_id",
'Rating'::text as "type",
(groups.type)::text as "group_type",
"requests"."user_id" as user_id,
("responses"."content"->>'rating')::numeric as "value"
FROM responses
INNER JOIN request_questions ON request_questions.id = responses.request_question_id AND request_questions.question_id IN (SELECT id FROM questions WHERE uuid IN ('6d5f647f-6251-4ded-bb9f-9523debf2bd5', '40ff1921-9d62-4183-9de8-36d845d47494'))
INNER JOIN questions ON questions.id = request_questions.question_id
INNER JOIN requests ON requests.id = request_questions.request_id
INNER JOIN groups ON groups.id = requests.group_id
WHERE requests.submitted_at IS NOT NULL
) as t on t.user_id = demographics.user_id
group by "labels"."labelID", "labels"."labelName";
解释分析
GroupAggregate (cost=12894.34..12903.78 rows=8 width=306) (actual time=604.444..604.444 rows=1 loops=1)
Group Key: ((user_demographics.user_info ->> 'gender'::text)), ((user_demographics.user_info ->> 'gender'::text))
-> Sort (cost=12894.34..12894.93 rows=233 width=306) (actual time=555.363..566.431 rows=46019 loops=1)
Sort Key: ((user_demographics.user_info ->> 'gender'::text)), ((user_demographics.user_info ->> 'gender'::text))
Sort Method: external sort Disk: 4488kB
-> Hash Left Join (cost=2091.03..12885.18 rows=233 width=306) (actual time=497.782..531.016 rows=46019 loops=1)
Hash Cond: (demographics.user_id = requests.user_id)
-> Nested Loop Left Join (cost=1475.26..12266.78 rows=233 width=68) (actual time=6.326..16.015 rows=5898 loops=1)
Join Filter: (NOT (((user_demographics.user_info ->> 'gender'::text)) IS DISTINCT FROM (demographics.user_info ->> 'gender'::text)))
-> Limit (cost=1341.68..1341.80 rows=8 width=323) (actual time=5.918..5.920 rows=1 loops=1)
-> HashAggregate (cost=1341.68..1342.54 rows=57 width=323) (actual time=5.918..5.920 rows=1 loops=1)
Group Key: (user_demographics.user_info ->> 'gender'::text), (user_demographics.user_info ->> 'gender'::text)
-> Bitmap Heap Scan on user_demographics (cost=373.83..1340.31 rows=274 width=323) (actual time=1.160..4.462 rows=5898 loops=1)
Recheck Cond: ((iteration_id = 2) AND (company_id = 1))
Heap Blocks: exact=301
-> BitmapAnd (cost=373.83..373.83 rows=274 width=0) (actual time=1.121..1.121 rows=0 loops=1)
-> Bitmap Index Scan on user_demographics_iteration_id_fkey (cost=0.00..132.11 rows=5826 width=0) (actual time=0.403..0.403 rows=5898 loops=1)
Index Cond: (iteration_id = 2)
-> Bitmap Index Scan on user_demographics_company_id_fkey (cost=0.00..241.33 rows=10788 width=0) (actual time=0.695..0.695 rows=10792 loops=1)
Index Cond: (company_id = 1)
-> Materialize (cost=133.57..10123.82 rows=5826 width=327) (actual time=0.402..7.050 rows=5898 loops=1)
-> Bitmap Heap Scan on user_demographics demographics (cost=133.57..10094.69 rows=5826 width=327) (actual time=0.396..2.768 rows=5898 loops=1)
Recheck Cond: (iteration_id = 2)
Heap Blocks: exact=301
-> Bitmap Index Scan on user_demographics_iteration_id_fkey (cost=0.00..132.11 rows=5826 width=0) (actual time=0.366..0.366 rows=5898 loops=1)
Index Cond: (iteration_id = 2)
-> Hash (cost=612.69..612.69 rows=247 width=242) (actual time=491.430..491.430 rows=45722 loops=1)
Buckets: 1024 Batches: 2 (originally 1) Memory Usage: 4097kB
-> Nested Loop (cost=2.42..612.69 rows=247 width=242) (actual time=0.055..466.795 rows=45722 loops=1)
-> Nested Loop (cost=2.13..531.53 rows=247 width=214) (actual time=0.047..364.824 rows=45722 loops=1)
-> Nested Loop (cost=1.70..335.50 rows=340 width=28) (actual time=0.038..198.281 rows=45722 loops=1)
-> Nested Loop (cost=1.27..57.52 rows=597 width=24) (actual time=0.032..55.728 rows=51210 loops=1)
Join Filter: (questions_1.id = request_questions.question_id)
-> Nested Loop (cost=0.84..33.77 rows=2 width=24) (actual time=0.015..0.036 rows=2 loops=1)
-> Index Scan using questions_uuid_key on questions questions_1 (cost=0.42..16.87 rows=2 width=4) (actual time=0.008..0.017 rows=2 loops=1)
Index Cond: (uuid = ANY ('{6d5f647f-6251-4ded-bb9f-9523debf2bd5,40ff1921-9d62-4183-9de8-36d845d47494}'::uuid[]))
-> Index Scan using questions_pkey on questions (cost=0.42..8.44 rows=1 width=20) (actual time=0.005..0.007 rows=1 loops=2)
Index Cond: (id = questions_1.id)
-> Index Scan using request_questions_question_id_fkey on request_questions (cost=0.43..8.15 rows=298 width=12) (actual time=0.011..15.686 rows=25605 loops=2)
Index Cond: (question_id = questions.id)
-> Index Scan using requests_pkey on requests (cost=0.42..0.46 rows=1 width=12) (actual time=0.002..0.002 rows=1 loops=51210)
Index Cond: (id = request_questions.request_id)
Filter: (submitted_at IS NOT NULL)
Rows Removed by Filter: 0
-> Index Scan using responses_request_question_id_fkey on responses (cost=0.43..0.57 rows=1 width=194) (actual time=0.002..0.003 rows=1 loops=45722)
Index Cond: (request_question_id = request_questions.id)
-> Index Only Scan using groups_pkey on groups (cost=0.29..0.32 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=45722)
Index Cond: (id = requests.group_id)
Heap Fetches: 45722
Planning time: 6.715 ms
Execution time: 606.478 ms
经过分析EXPLAIN
,我认为主要问题出在这个子查询:
SELECT questions.uuid::uuid,
"responses"."id" as "response_id",
'Rating'::text as "type",
(groups.type)::text as "group_type",
"requests"."user_id" as user_id,
("responses"."content"->>'rating')::numeric as "value"
FROM responses
INNER JOIN request_questions ON request_questions.id = responses.request_question_id AND request_questions.question_id IN (SELECT id FROM questions WHERE uuid IN ('6d5f647f-6251-4ded-bb9f-9523debf2bd5', '40ff1921-9d62-4183-9de8-36d845d47494'))
INNER JOIN questions ON questions.id = request_questions.question_id
INNER JOIN requests ON requests.id = request_questions.request_id
INNER JOIN groups ON groups.id = requests.group_id
WHERE requests.submitted_at IS NOT NULL
我正试图找到一种方法来优化那个。如果对一组特定问题有很多答复,则可能需要一段时间。为了练习,重要的是要知道:
responses
在 request_question_id
上有一个索引
request_questions
在 question_id
和 request_id
上有一个索引
requests
在 group_id
上有一个索引
我还注意到,如果我在该子查询中添加一个额外的条件,例如:
AND groups.iteration_id = 2
它降低了性能,这对我来说是违反直觉的,因为 groups
在 iteration_id
上有一个索引。
如果需要更多信息,请告诉我。
UPDATE 按照 Laurenz Albe
的建议将问题 ID 作为参数传递
查询
SELECT labels."labelID",
COUNT(DISTINCT "demographics"."user_id") as "memberCount",
AVG("t"."value") FILTER(WHERE "t"."uuid" = '6d5f647f-6251-4ded-bb9f-9523debf2bd5' AND "t"."type" = 'Rating') AS "6d5f647f-6251-4ded-bb9f-9523debf2bd5__RA",
AVG("t"."value") FILTER(WHERE "t"."uuid" = '40ff1921-9d62-4183-9de8-36d845d47494' AND "t"."type" = 'Rating') AS "40ff1921-9d62-4183-9de8-36d845d47494__RA"
FROM (
SELECT DISTINCT ("user_info"->>'gender') as "labelID",
("user_info"->>'gender') as "labelName"
FROM "user_demographics"
WHERE "iteration_id" = 2 AND "company_id" = 1
LIMIT 8
) as "labels"
LEFT JOIN "user_demographics" as "demographics" on "labels"."labelID" IS NOT DISTINCT FROM ("demographics"."user_info"->>'gender')::text and "demographics"."iteration_id" = 2
LEFT JOIN (
SELECT questions.uuid::uuid,
"responses"."id" as "response_id",
'Rating'::text as "type",
(groups.type)::text as "group_type",
"requests"."user_id" as user_id,
("responses"."content"->>'rating')::numeric as "value"
FROM responses
INNER JOIN request_questions ON request_questions.id = responses.request_question_id AND request_questions.question_id IN (88,99)
INNER JOIN questions ON questions.id = request_questions.question_id
INNER JOIN requests ON requests.id = request_questions.request_id
INNER JOIN groups ON groups.id = requests.group_id
WHERE requests.submitted_at IS NOT NULL
) as t on t.user_id = demographics.user_id
group by "labels"."labelID", "labels"."labelName"
解释分析
GroupAggregate (cost=16424.72..16434.20 rows=8 width=306) (actual time=837.872..837.873 rows=1 loops=1)
Group Key: ((user_demographics.user_info ->> 'gender'::text)), ((user_demographics.user_info ->> 'gender'::text))
-> Sort (cost=16424.72..16425.31 rows=234 width=306) (actual time=787.578..798.965 rows=46019 loops=1)
Sort Key: ((user_demographics.user_info ->> 'gender'::text)), ((user_demographics.user_info ->> 'gender'::text))
Sort Method: external sort Disk: 4488kB
-> Nested Loop Left Join (cost=1481.27..16415.52 rows=234 width=306) (actual time=6.380..757.756 rows=46019 loops=1)
-> Nested Loop Left Join (cost=1479.27..12302.49 rows=234 width=68) (actual time=6.333..18.580 rows=5898 loops=1)
Join Filter: (NOT (((user_demographics.user_info ->> 'gender'::text)) IS DISTINCT FROM (demographics.user_info ->> 'gender'::text)))
-> Limit (cost=1345.56..1345.68 rows=8 width=323) (actual time=5.920..5.922 rows=1 loops=1)
-> HashAggregate (cost=1345.56..1346.42 rows=57 width=323) (actual time=5.919..5.921 rows=1 loops=1)
Group Key: (user_demographics.user_info ->> 'gender'::text), (user_demographics.user_info ->> 'gender'::text)
-> Bitmap Heap Scan on user_demographics (cost=374.19..1344.19 rows=275 width=323) (actual time=1.133..4.425 rows=5898 loops=1)
Recheck Cond: ((iteration_id = 2) AND (company_id = 1))
Heap Blocks: exact=301
-> BitmapAnd (cost=374.19..374.19 rows=275 width=0) (actual time=1.095..1.095 rows=0 loops=1)
-> Bitmap Index Scan on user_demographics_iteration_id_fkey (cost=0.00..132.24 rows=5843 width=0) (actual time=0.389..0.389 rows=5898 loops=1)
Index Cond: (iteration_id = 2)
-> Bitmap Index Scan on user_demographics_company_id_fkey (cost=0.00..241.56 rows=10819 width=0) (actual time=0.685..0.685 rows=10792 loops=1)
Index Cond: (company_id = 1)
-> Materialize (cost=133.70..10153.31 rows=5843 width=327) (actual time=0.405..8.636 rows=5898 loops=1)
-> Bitmap Heap Scan on user_demographics demographics (cost=133.70..10124.10 rows=5843 width=327) (actual time=0.401..3.085 rows=5898 loops=1)
Recheck Cond: (iteration_id = 2)
Heap Blocks: exact=301
-> Bitmap Index Scan on user_demographics_iteration_id_fkey (cost=0.00..132.24 rows=5843 width=0) (actual time=0.370..0.370 rows=5898 loops=1)
Index Cond: (iteration_id = 2)
-> Nested Loop (cost=2.00..17.57 rows=1 width=242) (actual time=0.020..0.122 rows=8 loops=5898)
-> Nested Loop (cost=1.71..17.24 rows=1 width=214) (actual time=0.018..0.103 rows=8 loops=5898)
-> Nested Loop (cost=1.29..16.06 rows=1 width=202) (actual time=0.016..0.084 rows=8 loops=5898)
-> Nested Loop (cost=0.86..11.07 rows=1 width=16) (actual time=0.012..0.055 rows=8 loops=5898)
-> Index Scan using entity_requests_user_id_fkey on requests (cost=0.42..6.38 rows=3 width=12) (actual time=0.003..0.007 rows=7 loops=5898)
Index Cond: (user_id = demographics.user_id)
Filter: (submitted_at IS NOT NULL)
Rows Removed by Filter: 1
-> Index Scan using request_questions_request_id_fkey on request_questions (cost=0.43..1.55 rows=1 width=12) (actual time=0.004..0.006 rows=1 loops=39600)
Index Cond: (request_id = requests.id)
Filter: (question_id = ANY ('{88,99}'::integer[]))
Rows Removed by Filter: 16
-> Index Scan using responses_request_question_id_fkey on responses (cost=0.43..4.99 rows=1 width=194) (actual time=0.003..0.003 rows=1 loops=45722)
Index Cond: (request_question_id = request_questions.id)
-> Index Scan using questions_pkey on questions (cost=0.42..1.16 rows=1 width=20) (actual time=0.001..0.002 rows=1 loops=45722)
Index Cond: (id = request_questions.question_id)
-> Index Only Scan using groups_pkey on groups (cost=0.29..0.32 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=45722)
Index Cond: (id = requests.group_id)
Heap Fetches: 45722
Planning time: 2.308 ms
Execution time: 839.766 ms
您的问题是 request_questions
上的索引扫描估计错误(289 而不是 25605)。
这导致 PostgreSQL 错误地选择了嵌套循环连接。
如果将查询拆分为两个查询,您可能会得到更好的估计:
获得所有questions.id
一个以 id
s 作为参数并完成其余部分。
您根据上述修改的查询中,以下部分又是一个错误估计:
(
SELECT DISTINCT ("user_info"->>'gender') as "labelID",
("user_info"->>'gender') as "labelName"
FROM "user_demographics"
WHERE "iteration_id" = 2 AND "company_id" = 1
LIMIT 8
) as "labels"
LEFT JOIN "user_demographics" as "demographics"
on "labels"."labelID" IS NOT DISTINCT FROM
("demographics"."user_info"->>'gender')::text
and "demographics"."iteration_id" = 2
看起来应该有更简单的写法,但我不知道数据,所以我不能说如何。更简单的条件可能会导致更好的估计。
如果您无法弄清楚,请尝试在不改变结果的连接条件中添加一个 OR
条件。因此会导致优化器估计更高,所以整体计划可能会更好(没有嵌套循环连接)。
使用 9.4.20 版本。
我很难尝试优化查询。我认为在查看 EXPLAIN
之后,问题出在第二个左联接的子查询中。请注意,我决定进行子查询,因为我有多个可被视为响应的来源,因此我试图通过(必要时)执行 UNION ALL
s 来在其中拥有一种同质数据源。为了问题的简单性,我没有做任何 UNION ALL
s 因为那不是我面临的主要问题。以下是事实:
查询
SELECT labels."labelID",
COUNT(DISTINCT "demographics"."user_id") as "memberCount",
AVG("t"."value") FILTER(WHERE "t"."uuid" = '6d5f647f-6251-4ded-bb9f-9523debf2bd5' AND "t"."type" = 'Rating') AS "6d5f647f-6251-4ded-bb9f-9523debf2bd5__RA",
AVG("t"."value") FILTER(WHERE "t"."uuid" = '40ff1921-9d62-4183-9de8-36d845d47494' AND "t"."type" = 'Rating') AS "40ff1921-9d62-4183-9de8-36d845d47494__RA"
FROM (
SELECT DISTINCT ("user_info"->>'gender') as "labelID",
("user_info"->>'gender') as "labelName"
FROM "user_demographics"
WHERE "iteration_id" = 2 AND "company_id" = 1
LIMIT 8
) as "labels"
LEFT JOIN "user_demographics" as "demographics" on "labels"."labelID" IS NOT DISTINCT FROM ("demographics"."user_info"->>'gender')::text and "demographics"."iteration_id" = 2
LEFT JOIN (
SELECT questions.uuid::uuid,
"responses"."id" as "response_id",
'Rating'::text as "type",
(groups.type)::text as "group_type",
"requests"."user_id" as user_id,
("responses"."content"->>'rating')::numeric as "value"
FROM responses
INNER JOIN request_questions ON request_questions.id = responses.request_question_id AND request_questions.question_id IN (SELECT id FROM questions WHERE uuid IN ('6d5f647f-6251-4ded-bb9f-9523debf2bd5', '40ff1921-9d62-4183-9de8-36d845d47494'))
INNER JOIN questions ON questions.id = request_questions.question_id
INNER JOIN requests ON requests.id = request_questions.request_id
INNER JOIN groups ON groups.id = requests.group_id
WHERE requests.submitted_at IS NOT NULL
) as t on t.user_id = demographics.user_id
group by "labels"."labelID", "labels"."labelName";
解释分析
GroupAggregate (cost=12894.34..12903.78 rows=8 width=306) (actual time=604.444..604.444 rows=1 loops=1)
Group Key: ((user_demographics.user_info ->> 'gender'::text)), ((user_demographics.user_info ->> 'gender'::text))
-> Sort (cost=12894.34..12894.93 rows=233 width=306) (actual time=555.363..566.431 rows=46019 loops=1)
Sort Key: ((user_demographics.user_info ->> 'gender'::text)), ((user_demographics.user_info ->> 'gender'::text))
Sort Method: external sort Disk: 4488kB
-> Hash Left Join (cost=2091.03..12885.18 rows=233 width=306) (actual time=497.782..531.016 rows=46019 loops=1)
Hash Cond: (demographics.user_id = requests.user_id)
-> Nested Loop Left Join (cost=1475.26..12266.78 rows=233 width=68) (actual time=6.326..16.015 rows=5898 loops=1)
Join Filter: (NOT (((user_demographics.user_info ->> 'gender'::text)) IS DISTINCT FROM (demographics.user_info ->> 'gender'::text)))
-> Limit (cost=1341.68..1341.80 rows=8 width=323) (actual time=5.918..5.920 rows=1 loops=1)
-> HashAggregate (cost=1341.68..1342.54 rows=57 width=323) (actual time=5.918..5.920 rows=1 loops=1)
Group Key: (user_demographics.user_info ->> 'gender'::text), (user_demographics.user_info ->> 'gender'::text)
-> Bitmap Heap Scan on user_demographics (cost=373.83..1340.31 rows=274 width=323) (actual time=1.160..4.462 rows=5898 loops=1)
Recheck Cond: ((iteration_id = 2) AND (company_id = 1))
Heap Blocks: exact=301
-> BitmapAnd (cost=373.83..373.83 rows=274 width=0) (actual time=1.121..1.121 rows=0 loops=1)
-> Bitmap Index Scan on user_demographics_iteration_id_fkey (cost=0.00..132.11 rows=5826 width=0) (actual time=0.403..0.403 rows=5898 loops=1)
Index Cond: (iteration_id = 2)
-> Bitmap Index Scan on user_demographics_company_id_fkey (cost=0.00..241.33 rows=10788 width=0) (actual time=0.695..0.695 rows=10792 loops=1)
Index Cond: (company_id = 1)
-> Materialize (cost=133.57..10123.82 rows=5826 width=327) (actual time=0.402..7.050 rows=5898 loops=1)
-> Bitmap Heap Scan on user_demographics demographics (cost=133.57..10094.69 rows=5826 width=327) (actual time=0.396..2.768 rows=5898 loops=1)
Recheck Cond: (iteration_id = 2)
Heap Blocks: exact=301
-> Bitmap Index Scan on user_demographics_iteration_id_fkey (cost=0.00..132.11 rows=5826 width=0) (actual time=0.366..0.366 rows=5898 loops=1)
Index Cond: (iteration_id = 2)
-> Hash (cost=612.69..612.69 rows=247 width=242) (actual time=491.430..491.430 rows=45722 loops=1)
Buckets: 1024 Batches: 2 (originally 1) Memory Usage: 4097kB
-> Nested Loop (cost=2.42..612.69 rows=247 width=242) (actual time=0.055..466.795 rows=45722 loops=1)
-> Nested Loop (cost=2.13..531.53 rows=247 width=214) (actual time=0.047..364.824 rows=45722 loops=1)
-> Nested Loop (cost=1.70..335.50 rows=340 width=28) (actual time=0.038..198.281 rows=45722 loops=1)
-> Nested Loop (cost=1.27..57.52 rows=597 width=24) (actual time=0.032..55.728 rows=51210 loops=1)
Join Filter: (questions_1.id = request_questions.question_id)
-> Nested Loop (cost=0.84..33.77 rows=2 width=24) (actual time=0.015..0.036 rows=2 loops=1)
-> Index Scan using questions_uuid_key on questions questions_1 (cost=0.42..16.87 rows=2 width=4) (actual time=0.008..0.017 rows=2 loops=1)
Index Cond: (uuid = ANY ('{6d5f647f-6251-4ded-bb9f-9523debf2bd5,40ff1921-9d62-4183-9de8-36d845d47494}'::uuid[]))
-> Index Scan using questions_pkey on questions (cost=0.42..8.44 rows=1 width=20) (actual time=0.005..0.007 rows=1 loops=2)
Index Cond: (id = questions_1.id)
-> Index Scan using request_questions_question_id_fkey on request_questions (cost=0.43..8.15 rows=298 width=12) (actual time=0.011..15.686 rows=25605 loops=2)
Index Cond: (question_id = questions.id)
-> Index Scan using requests_pkey on requests (cost=0.42..0.46 rows=1 width=12) (actual time=0.002..0.002 rows=1 loops=51210)
Index Cond: (id = request_questions.request_id)
Filter: (submitted_at IS NOT NULL)
Rows Removed by Filter: 0
-> Index Scan using responses_request_question_id_fkey on responses (cost=0.43..0.57 rows=1 width=194) (actual time=0.002..0.003 rows=1 loops=45722)
Index Cond: (request_question_id = request_questions.id)
-> Index Only Scan using groups_pkey on groups (cost=0.29..0.32 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=45722)
Index Cond: (id = requests.group_id)
Heap Fetches: 45722
Planning time: 6.715 ms
Execution time: 606.478 ms
经过分析EXPLAIN
,我认为主要问题出在这个子查询:
SELECT questions.uuid::uuid,
"responses"."id" as "response_id",
'Rating'::text as "type",
(groups.type)::text as "group_type",
"requests"."user_id" as user_id,
("responses"."content"->>'rating')::numeric as "value"
FROM responses
INNER JOIN request_questions ON request_questions.id = responses.request_question_id AND request_questions.question_id IN (SELECT id FROM questions WHERE uuid IN ('6d5f647f-6251-4ded-bb9f-9523debf2bd5', '40ff1921-9d62-4183-9de8-36d845d47494'))
INNER JOIN questions ON questions.id = request_questions.question_id
INNER JOIN requests ON requests.id = request_questions.request_id
INNER JOIN groups ON groups.id = requests.group_id
WHERE requests.submitted_at IS NOT NULL
我正试图找到一种方法来优化那个。如果对一组特定问题有很多答复,则可能需要一段时间。为了练习,重要的是要知道:
responses
在request_question_id
上有一个索引
request_questions
在question_id
和request_id
上有一个索引
requests
在group_id
上有一个索引
我还注意到,如果我在该子查询中添加一个额外的条件,例如:
AND groups.iteration_id = 2
它降低了性能,这对我来说是违反直觉的,因为 groups
在 iteration_id
上有一个索引。
如果需要更多信息,请告诉我。
UPDATE 按照 Laurenz Albe
的建议将问题 ID 作为参数传递查询
SELECT labels."labelID",
COUNT(DISTINCT "demographics"."user_id") as "memberCount",
AVG("t"."value") FILTER(WHERE "t"."uuid" = '6d5f647f-6251-4ded-bb9f-9523debf2bd5' AND "t"."type" = 'Rating') AS "6d5f647f-6251-4ded-bb9f-9523debf2bd5__RA",
AVG("t"."value") FILTER(WHERE "t"."uuid" = '40ff1921-9d62-4183-9de8-36d845d47494' AND "t"."type" = 'Rating') AS "40ff1921-9d62-4183-9de8-36d845d47494__RA"
FROM (
SELECT DISTINCT ("user_info"->>'gender') as "labelID",
("user_info"->>'gender') as "labelName"
FROM "user_demographics"
WHERE "iteration_id" = 2 AND "company_id" = 1
LIMIT 8
) as "labels"
LEFT JOIN "user_demographics" as "demographics" on "labels"."labelID" IS NOT DISTINCT FROM ("demographics"."user_info"->>'gender')::text and "demographics"."iteration_id" = 2
LEFT JOIN (
SELECT questions.uuid::uuid,
"responses"."id" as "response_id",
'Rating'::text as "type",
(groups.type)::text as "group_type",
"requests"."user_id" as user_id,
("responses"."content"->>'rating')::numeric as "value"
FROM responses
INNER JOIN request_questions ON request_questions.id = responses.request_question_id AND request_questions.question_id IN (88,99)
INNER JOIN questions ON questions.id = request_questions.question_id
INNER JOIN requests ON requests.id = request_questions.request_id
INNER JOIN groups ON groups.id = requests.group_id
WHERE requests.submitted_at IS NOT NULL
) as t on t.user_id = demographics.user_id
group by "labels"."labelID", "labels"."labelName"
解释分析
GroupAggregate (cost=16424.72..16434.20 rows=8 width=306) (actual time=837.872..837.873 rows=1 loops=1)
Group Key: ((user_demographics.user_info ->> 'gender'::text)), ((user_demographics.user_info ->> 'gender'::text))
-> Sort (cost=16424.72..16425.31 rows=234 width=306) (actual time=787.578..798.965 rows=46019 loops=1)
Sort Key: ((user_demographics.user_info ->> 'gender'::text)), ((user_demographics.user_info ->> 'gender'::text))
Sort Method: external sort Disk: 4488kB
-> Nested Loop Left Join (cost=1481.27..16415.52 rows=234 width=306) (actual time=6.380..757.756 rows=46019 loops=1)
-> Nested Loop Left Join (cost=1479.27..12302.49 rows=234 width=68) (actual time=6.333..18.580 rows=5898 loops=1)
Join Filter: (NOT (((user_demographics.user_info ->> 'gender'::text)) IS DISTINCT FROM (demographics.user_info ->> 'gender'::text)))
-> Limit (cost=1345.56..1345.68 rows=8 width=323) (actual time=5.920..5.922 rows=1 loops=1)
-> HashAggregate (cost=1345.56..1346.42 rows=57 width=323) (actual time=5.919..5.921 rows=1 loops=1)
Group Key: (user_demographics.user_info ->> 'gender'::text), (user_demographics.user_info ->> 'gender'::text)
-> Bitmap Heap Scan on user_demographics (cost=374.19..1344.19 rows=275 width=323) (actual time=1.133..4.425 rows=5898 loops=1)
Recheck Cond: ((iteration_id = 2) AND (company_id = 1))
Heap Blocks: exact=301
-> BitmapAnd (cost=374.19..374.19 rows=275 width=0) (actual time=1.095..1.095 rows=0 loops=1)
-> Bitmap Index Scan on user_demographics_iteration_id_fkey (cost=0.00..132.24 rows=5843 width=0) (actual time=0.389..0.389 rows=5898 loops=1)
Index Cond: (iteration_id = 2)
-> Bitmap Index Scan on user_demographics_company_id_fkey (cost=0.00..241.56 rows=10819 width=0) (actual time=0.685..0.685 rows=10792 loops=1)
Index Cond: (company_id = 1)
-> Materialize (cost=133.70..10153.31 rows=5843 width=327) (actual time=0.405..8.636 rows=5898 loops=1)
-> Bitmap Heap Scan on user_demographics demographics (cost=133.70..10124.10 rows=5843 width=327) (actual time=0.401..3.085 rows=5898 loops=1)
Recheck Cond: (iteration_id = 2)
Heap Blocks: exact=301
-> Bitmap Index Scan on user_demographics_iteration_id_fkey (cost=0.00..132.24 rows=5843 width=0) (actual time=0.370..0.370 rows=5898 loops=1)
Index Cond: (iteration_id = 2)
-> Nested Loop (cost=2.00..17.57 rows=1 width=242) (actual time=0.020..0.122 rows=8 loops=5898)
-> Nested Loop (cost=1.71..17.24 rows=1 width=214) (actual time=0.018..0.103 rows=8 loops=5898)
-> Nested Loop (cost=1.29..16.06 rows=1 width=202) (actual time=0.016..0.084 rows=8 loops=5898)
-> Nested Loop (cost=0.86..11.07 rows=1 width=16) (actual time=0.012..0.055 rows=8 loops=5898)
-> Index Scan using entity_requests_user_id_fkey on requests (cost=0.42..6.38 rows=3 width=12) (actual time=0.003..0.007 rows=7 loops=5898)
Index Cond: (user_id = demographics.user_id)
Filter: (submitted_at IS NOT NULL)
Rows Removed by Filter: 1
-> Index Scan using request_questions_request_id_fkey on request_questions (cost=0.43..1.55 rows=1 width=12) (actual time=0.004..0.006 rows=1 loops=39600)
Index Cond: (request_id = requests.id)
Filter: (question_id = ANY ('{88,99}'::integer[]))
Rows Removed by Filter: 16
-> Index Scan using responses_request_question_id_fkey on responses (cost=0.43..4.99 rows=1 width=194) (actual time=0.003..0.003 rows=1 loops=45722)
Index Cond: (request_question_id = request_questions.id)
-> Index Scan using questions_pkey on questions (cost=0.42..1.16 rows=1 width=20) (actual time=0.001..0.002 rows=1 loops=45722)
Index Cond: (id = request_questions.question_id)
-> Index Only Scan using groups_pkey on groups (cost=0.29..0.32 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=45722)
Index Cond: (id = requests.group_id)
Heap Fetches: 45722
Planning time: 2.308 ms
Execution time: 839.766 ms
您的问题是 request_questions
上的索引扫描估计错误(289 而不是 25605)。
这导致 PostgreSQL 错误地选择了嵌套循环连接。
如果将查询拆分为两个查询,您可能会得到更好的估计:
获得所有
questions.id
一个以
id
s 作为参数并完成其余部分。
您根据上述修改的查询中,以下部分又是一个错误估计:
(
SELECT DISTINCT ("user_info"->>'gender') as "labelID",
("user_info"->>'gender') as "labelName"
FROM "user_demographics"
WHERE "iteration_id" = 2 AND "company_id" = 1
LIMIT 8
) as "labels"
LEFT JOIN "user_demographics" as "demographics"
on "labels"."labelID" IS NOT DISTINCT FROM
("demographics"."user_info"->>'gender')::text
and "demographics"."iteration_id" = 2
看起来应该有更简单的写法,但我不知道数据,所以我不能说如何。更简单的条件可能会导致更好的估计。
如果您无法弄清楚,请尝试在不改变结果的连接条件中添加一个 OR
条件。因此会导致优化器估计更高,所以整体计划可能会更好(没有嵌套循环连接)。