此查询如何使用 window 函数返回每个键的多个结果?

How is this query using a window function returning multiple results per key?

我写了下面的查询

SELECT
    data.id,
    LAST_VALUE(data.access_type) OVER (
        PARTITION BY data.id
        ORDER BY CAST(FROM_UNIXTIME(ts) AS TIMESTAMP)
    ) AS access_type,
    LAST_VALUE(CAST(FROM_UNIXTIME(ts) AS TIMESTAMP)) OVER (
        PARTITION BY data.id
        ORDER BY CAST(FROM_UNIXTIME(ts) AS TIMESTAMP)
    ) AS access_type_timestamp
FROM
    table

其中 data 是一个结构。

我希望这个 return 每 id 一行,最近的 access_typets id。但是,它有时仍然 return 每个 id 多行。

我做错了什么?

使用SELECT DISTINCT。我建议 FIRST_VALUE():

SELECT DISTINCT
    data.id,
    FIRST_VALUE(data.access_type) OVER (
        PARTITION BY data.id
        ORDER BY CAST(FROM_UNIXTIME(ts) AS TIMESTAMP) DESC
    ) AS access_type,
    FIRST_VALUE(CAST(FROM_UNIXTIME(ts) AS TIMESTAMP)) OVER (
        PARTITION BY data.id
        ORDER BY CAST(FROM_UNIXTIME(ts) AS TIMESTAMP) DESC
    ) AS access_type_timestamp
FROM table;

Window函数不减少行数。

此外,我假设 ts 的排序方式与时间戳相同,因此可以简化。另外,第二个只是 MAX():

SELECT DISTINCT data.id,
       FIRST_VALUE(data.access_type) OVER (PARTITION BY data.id
                                           ORDER BY ts DESC
                                          ) AS access_type,
       MAX(CAST(FROM_UNIXTIME(ts) AS TIMESTAMP)) OVER (PARTITION BY data.id) AS access_type_timestamp
FROM table;

如果您确实使用 LAST_VALUE(),则需要一个窗口子句:

LAST_VALUE(data.access_type) OVER (PARTITION BY data.id
                                   ORDER BY ts
                                   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
                                 ) AS access_type

带有 ORDER BY 的默认窗口子句是 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW —— 它有一个坏习惯,导致 LAST_VALUE() 简单地 return 当前行中的值。