使用 GROUP BY 时如何选择分区内的最佳行
how to choose best row inside a partition when using GROUP BY
在 bigquery 中,我希望能够 select 通过在另一列上应用条件来 select 不在按列分组列表中的列。
假设我有以下列
group, id, datecreated
和以下查询:
select group, max(datecreated) from table group by group
我希望查询也 return 具有 max(datecreated)
的行的 id
到目前为止,我了解到聚合函数仅适用于一列。一个想法是将创建日期和 id
连接起来,获取 MAX(),然后使用正则表达式提取 ID。
我觉得应该有一个更简单的解决方案。
可以使用Window Functions,按组划分,然后按时间降序排列,选择第一个。
SELECT *
FROM
(SELECT g,
v,
row_number() over (partition BY g
ORDER BY t DESC) AS POSITION
FROM
(SELECT 1 AS g,
1 AS t,
10 AS v),
(SELECT 1 AS g,
2 AS t,
20 AS v),
(SELECT 1 AS g,
3 AS t,
15 AS v))
WHERE POSITION=1
对于这个小数据集,这个 returns
+---+----+----------+---+
| g | v | position | |
+---+----+----------+---+
| 1 | 15 | 1 | |
+---+----+----------+---+
在 bigquery 中,我希望能够 select 通过在另一列上应用条件来 select 不在按列分组列表中的列。
假设我有以下列
group, id, datecreated
和以下查询:
select group, max(datecreated) from table group by group
我希望查询也 return 具有 max(datecreated)
id
到目前为止,我了解到聚合函数仅适用于一列。一个想法是将创建日期和 id
连接起来,获取 MAX(),然后使用正则表达式提取 ID。
我觉得应该有一个更简单的解决方案。
可以使用Window Functions,按组划分,然后按时间降序排列,选择第一个。
SELECT *
FROM
(SELECT g,
v,
row_number() over (partition BY g
ORDER BY t DESC) AS POSITION
FROM
(SELECT 1 AS g,
1 AS t,
10 AS v),
(SELECT 1 AS g,
2 AS t,
20 AS v),
(SELECT 1 AS g,
3 AS t,
15 AS v))
WHERE POSITION=1
对于这个小数据集,这个 returns
+---+----+----------+---+
| g | v | position | |
+---+----+----------+---+
| 1 | 15 | 1 | |
+---+----+----------+---+