此查询如何使用 window 函数返回每个键的多个结果?
How is this query using a window function returning multiple results per key?
我写了下面的查询
SELECT
data.id,
LAST_VALUE(data.access_type) OVER (
PARTITION BY data.id
ORDER BY CAST(FROM_UNIXTIME(ts) AS TIMESTAMP)
) AS access_type,
LAST_VALUE(CAST(FROM_UNIXTIME(ts) AS TIMESTAMP)) OVER (
PARTITION BY data.id
ORDER BY CAST(FROM_UNIXTIME(ts) AS TIMESTAMP)
) AS access_type_timestamp
FROM
table
其中 data
是一个结构。
我希望这个 return 每 id
一行,最近的 access_type
和 ts
id
。但是,它有时仍然 return 每个 id
多行。
我做错了什么?
使用SELECT DISTINCT
。我建议 FIRST_VALUE()
:
SELECT DISTINCT
data.id,
FIRST_VALUE(data.access_type) OVER (
PARTITION BY data.id
ORDER BY CAST(FROM_UNIXTIME(ts) AS TIMESTAMP) DESC
) AS access_type,
FIRST_VALUE(CAST(FROM_UNIXTIME(ts) AS TIMESTAMP)) OVER (
PARTITION BY data.id
ORDER BY CAST(FROM_UNIXTIME(ts) AS TIMESTAMP) DESC
) AS access_type_timestamp
FROM table;
Window函数不减少行数。
此外,我假设 ts
的排序方式与时间戳相同,因此可以简化。另外,第二个只是 MAX()
:
SELECT DISTINCT data.id,
FIRST_VALUE(data.access_type) OVER (PARTITION BY data.id
ORDER BY ts DESC
) AS access_type,
MAX(CAST(FROM_UNIXTIME(ts) AS TIMESTAMP)) OVER (PARTITION BY data.id) AS access_type_timestamp
FROM table;
如果您确实使用 LAST_VALUE()
,则需要一个窗口子句:
LAST_VALUE(data.access_type) OVER (PARTITION BY data.id
ORDER BY ts
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
) AS access_type
带有 ORDER BY
的默认窗口子句是 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
—— 它有一个坏习惯,导致 LAST_VALUE()
简单地 return 当前行中的值。
我写了下面的查询
SELECT
data.id,
LAST_VALUE(data.access_type) OVER (
PARTITION BY data.id
ORDER BY CAST(FROM_UNIXTIME(ts) AS TIMESTAMP)
) AS access_type,
LAST_VALUE(CAST(FROM_UNIXTIME(ts) AS TIMESTAMP)) OVER (
PARTITION BY data.id
ORDER BY CAST(FROM_UNIXTIME(ts) AS TIMESTAMP)
) AS access_type_timestamp
FROM
table
其中 data
是一个结构。
我希望这个 return 每 id
一行,最近的 access_type
和 ts
id
。但是,它有时仍然 return 每个 id
多行。
我做错了什么?
使用SELECT DISTINCT
。我建议 FIRST_VALUE()
:
SELECT DISTINCT
data.id,
FIRST_VALUE(data.access_type) OVER (
PARTITION BY data.id
ORDER BY CAST(FROM_UNIXTIME(ts) AS TIMESTAMP) DESC
) AS access_type,
FIRST_VALUE(CAST(FROM_UNIXTIME(ts) AS TIMESTAMP)) OVER (
PARTITION BY data.id
ORDER BY CAST(FROM_UNIXTIME(ts) AS TIMESTAMP) DESC
) AS access_type_timestamp
FROM table;
Window函数不减少行数。
此外,我假设 ts
的排序方式与时间戳相同,因此可以简化。另外,第二个只是 MAX()
:
SELECT DISTINCT data.id,
FIRST_VALUE(data.access_type) OVER (PARTITION BY data.id
ORDER BY ts DESC
) AS access_type,
MAX(CAST(FROM_UNIXTIME(ts) AS TIMESTAMP)) OVER (PARTITION BY data.id) AS access_type_timestamp
FROM table;
如果您确实使用 LAST_VALUE()
,则需要一个窗口子句:
LAST_VALUE(data.access_type) OVER (PARTITION BY data.id
ORDER BY ts
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
) AS access_type
带有 ORDER BY
的默认窗口子句是 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
—— 它有一个坏习惯,导致 LAST_VALUE()
简单地 return 当前行中的值。