为列的每个值选择最小值

Question

我有一个 table 是这样的：

col1       col2       num
a        <string1>     5
a        <string2>     10
a        <string3>     0
a        <string4>     7
b        <string1>     6
b        <string2>     3
b        <string3>     20
b        <string4>     1

我想 select col1（或最终 col2）的每个值的最小值，因此所需的输出将是：

col1       col2        num
a        <string3>     0
b        <string4>     1

我怎样才能做到这一点？我正在尝试在 BigQuery 中执行此操作。

Answer 1

如果您是运行 Postgres（如标记），您可以使用 distinct on:

select distinct on (col1) t.*
from mytable t
order by col1, num

在 BigQuery 中（如问题中所述），您可以使用数组执行此操作：

select array_agg(t order by num limit 1)[offset(0)].*
from mytable t
group by col1

Answer 2

以下适用于 BigQuery 标准 SQL

#standardSQL
select as value array_agg(t order by num limit 1)[offset(0)]
from `project.dataset.table` t
group by col1

如果应用到您问题中的样本数据，如下例

#standardSQL
WITH `project.dataset.table` AS (
  select 'a' col1, '<string1>' col2, 5 num union all
  select 'a', '<string2>', 10 union all
  select 'a', '<string3>', 0 union all
  select 'a', '<string4>', 7 union all
  select 'b', '<string1>', 6 union all
  select 'b', '<string2>', 3 union all
  select 'b', '<string3>', 20 union all
  select 'b', '<string4>', 1 
)
select as value array_agg(t order by num limit 1)[offset(0)]
from `project.dataset.table` t
group by col1

输出是

为列的每个值选择最小值

Selecting the min num for each value of a column

sql

greatest-n-per-group

google-bigquery