SQL AWS Athena Group by Without a Column

SQL AWS Athena Group by Without a Column

我有这个数据集

patient_id   doctor_id   status   created_at
1            1           A        2020-10-01 10:00:00
1            1           P        2020-10-01 10:30:00
1            1           U        2020-10-01 10:35:00
1            2           A        2020-10-01 10:40:00
...

我想按 patient_id 和 doctor_id 分组,但没有状态分组所以结果会像这样

patient_id   doctor_id   status   created_at
1            1           U        2020-10-01 10:35:00
1            2           A        2020-10-01 10:40:00
...

A​​WS Athena 必须对所有列进行分组,但我需要最后一个状态

ROW_NUMBER 提供了一种选择:

WITH cte AS (
    SELECT *,
        ROW_NUMBER() OVER (PARTITION BY patient_id, doctor_id ORDER BY created_at DESC) rn
    FROM yourTable
)

SELECT patient_id, doctor_id, status, created_at
FROM cte
WHERE rn = 1
ORDER BY patient_id, doctor_id;

在 Athena/Presto 中,您可以使用 max_by 函数执行此操作:

SELECT
  patient_id,
  doctor_id,
  MAX_BY(status, created_at) AS last_status
FROM the_table
GROUP BY 1, 2

max_by(x, y) 函数 returns 组中第 y 列最大值的行的第 x 列的值。