在 AWS Redshift 中使用 Group By 计算中位数
Calculating median with Group By in AWS Redshift
我见过 other posts about using the median() window function in Redshift,但是您如何将它用于末尾有分组依据的查询?
例如,假设 table 课程:
Course | Subject | Num_Students
-------------------------------
1 | Math | 4
2 | Math | 6
3 | Math | 10
4 | Science | 2
5 | Science | 10
6 | Science | 12
我想获得每个课程科目的学生人数中位数。我将如何编写给出以下结果的查询:
Subject | Median
-----------------------
Math | 6
Science | 10
我试过:
SELECT
subject, median(num_students) over ()
FROM
course
GROUP BY 1
;
但它列出了该主题的每次出现以及跨主题的相同中位数(这是假数据,因此它的实际值 returns 不是 6,只是显示它在所有主题中都是相同的) :
Subject | Median
-----------------------
Math | 6
Math | 6
Math | 6
Science | 6
Science | 6
Science | 6
您尚未在 window 中定义分区。而不是 OVER()
你需要 OVER(PARTITION BY subject)
.
以下内容将为您提供所需的准确结果:
SELECT distinct
subject, median(num_students) over(partition by Subject)
FROM
course
order by Subject;
假设您想按主题计算其他聚合,例如 avg(),
你需要使用子查询:
WITH subject_numstudents_medianstudents AS (
SELECT
subject
, num_students
, median(num_students) over (partition BY subject) AS median_students
FROM
course
)
SELECT
subject
, median_students
, avg(num_students) as avg_students
FROM subject_numstudents_medianstudents
GROUP BY 1, 2
您只需删除其中的 "over()" 部分。
SELECT subject, median(num_students) FROM course GROUP BY 1;
我见过 other posts about using the median() window function in Redshift,但是您如何将它用于末尾有分组依据的查询?
例如,假设 table 课程:
Course | Subject | Num_Students
-------------------------------
1 | Math | 4
2 | Math | 6
3 | Math | 10
4 | Science | 2
5 | Science | 10
6 | Science | 12
我想获得每个课程科目的学生人数中位数。我将如何编写给出以下结果的查询:
Subject | Median
-----------------------
Math | 6
Science | 10
我试过:
SELECT
subject, median(num_students) over ()
FROM
course
GROUP BY 1
;
但它列出了该主题的每次出现以及跨主题的相同中位数(这是假数据,因此它的实际值 returns 不是 6,只是显示它在所有主题中都是相同的) :
Subject | Median
-----------------------
Math | 6
Math | 6
Math | 6
Science | 6
Science | 6
Science | 6
您尚未在 window 中定义分区。而不是 OVER()
你需要 OVER(PARTITION BY subject)
.
以下内容将为您提供所需的准确结果:
SELECT distinct
subject, median(num_students) over(partition by Subject)
FROM
course
order by Subject;
假设您想按主题计算其他聚合,例如 avg(), 你需要使用子查询:
WITH subject_numstudents_medianstudents AS (
SELECT
subject
, num_students
, median(num_students) over (partition BY subject) AS median_students
FROM
course
)
SELECT
subject
, median_students
, avg(num_students) as avg_students
FROM subject_numstudents_medianstudents
GROUP BY 1, 2
您只需删除其中的 "over()" 部分。
SELECT subject, median(num_students) FROM course GROUP BY 1;