SQL AWS Athena Group by Without a Column
SQL AWS Athena Group by Without a Column
我有这个数据集
patient_id doctor_id status created_at
1 1 A 2020-10-01 10:00:00
1 1 P 2020-10-01 10:30:00
1 1 U 2020-10-01 10:35:00
1 2 A 2020-10-01 10:40:00
...
我想按 patient_id 和 doctor_id 分组,但没有状态分组所以结果会像这样
patient_id doctor_id status created_at
1 1 U 2020-10-01 10:35:00
1 2 A 2020-10-01 10:40:00
...
AWS Athena 必须对所有列进行分组,但我需要最后一个状态
ROW_NUMBER
提供了一种选择:
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY patient_id, doctor_id ORDER BY created_at DESC) rn
FROM yourTable
)
SELECT patient_id, doctor_id, status, created_at
FROM cte
WHERE rn = 1
ORDER BY patient_id, doctor_id;
在 Athena/Presto 中,您可以使用 max_by
函数执行此操作:
SELECT
patient_id,
doctor_id,
MAX_BY(status, created_at) AS last_status
FROM the_table
GROUP BY 1, 2
max_by(x, y)
函数 returns 组中第 y
列最大值的行的第 x
列的值。
我有这个数据集
patient_id doctor_id status created_at
1 1 A 2020-10-01 10:00:00
1 1 P 2020-10-01 10:30:00
1 1 U 2020-10-01 10:35:00
1 2 A 2020-10-01 10:40:00
...
我想按 patient_id 和 doctor_id 分组,但没有状态分组所以结果会像这样
patient_id doctor_id status created_at
1 1 U 2020-10-01 10:35:00
1 2 A 2020-10-01 10:40:00
...
AWS Athena 必须对所有列进行分组,但我需要最后一个状态
ROW_NUMBER
提供了一种选择:
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY patient_id, doctor_id ORDER BY created_at DESC) rn
FROM yourTable
)
SELECT patient_id, doctor_id, status, created_at
FROM cte
WHERE rn = 1
ORDER BY patient_id, doctor_id;
在 Athena/Presto 中,您可以使用 max_by
函数执行此操作:
SELECT
patient_id,
doctor_id,
MAX_BY(status, created_at) AS last_status
FROM the_table
GROUP BY 1, 2
max_by(x, y)
函数 returns 组中第 y
列最大值的行的第 x
列的值。