计算PostgreSQL中另一列中具有公共值的列值的最大值
Calculate maximum of column values with common value in another column in PostgreSQL
我正在尝试计算具有公共 ID 的列值的最大值。
我有以下 table 作为输入
TABLE 1:
| id | seq | score |
| ----- | ------ | ----- |
| UA502 | qrst | 8.2 |
| UA502 | abcdef | 2.2 |
| UA504 | yzab | 8.8 |
| UA504 | lmnop | 2.8 |
| UA503 | uvwx | 8.6 |
| UA503 | ghijk | 2.6 |
期望的输出是:
| id | seq | score |
| ----- | ------ | ----- |
| UA502 | qrst | 8.2 |
| UA504 | yzab | 8.8 |
| UA503 | uvwx | 8.6 |
我 运行 在 WITH 查询 (max_calc) 之后对另一个 WITH 查询 (union_data; TABLE 1) 的输出使用 groupby 和 max 函数。 =14=]
max_calc as(
select id, seq, max(score)
from union_data
GROUP BY id
)
select * from max_calc
;
我得到的错误是:
Query Error: error: column "union_data.seq" must appear in the GROUP BY clause or be used in an aggregate function
我不明白这个错误。我根据公共 ID 而不是 seq 对数据进行分组。为什么我应该在 GROUPBY 中包含列 "union_data.seq"。
谢谢
在 Postgres 中,您可以为此使用 handy extension distinct on
:
select distinct on (id) u.*
from union_data u
order by id, score desc
GMB 的答案是三列的最佳答案(并且已正式投票)。但是,如果您想要更多聚合,可以使用数组模拟 "first" 聚合函数:
select id,
(array_agg(seq order by score desc))[1] as seq,
max(score)
from union_data
group by id;
之前给出的答案显示了如何 correct/circumvent 原始错误。然而,他们没有解决关于错误原因的实际查询。所以让 return 到原始查询。
select id, seq, max(score)
from union_data
GROUP BY id
此查询导致错误。原因是分组中省略了非聚合列 seq。当聚合函数也在列列表中时,SQL 语法规则要求 select 列表中的所有非聚合列都在 "group by" 子句中。这就是 Postgres 具有 "distinct by" 扩展名的原因。它基本上允许绕过 SQL 语法规则,但它不是免费的午餐。 Distinct By 提出了自己的要求。
SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of
each set of rows where the given expressions evaluate to equal. The
DISTINCT ON expressions are interpreted using the same rules as for
ORDER BY (see above). Note that the “first row” of each set is
unpredictable unless ORDER BY is used to ensure that the desired row
appears first. ... The DISTINCT ON expression(s) must match the
leftmost ORDER BY expression(s). The ORDER BY clause will normally
contain additional expression(s) that determine the desired precedence
of rows within each DISTINCT ON group.]
我正在尝试计算具有公共 ID 的列值的最大值。
我有以下 table 作为输入
TABLE 1:
| id | seq | score |
| ----- | ------ | ----- |
| UA502 | qrst | 8.2 |
| UA502 | abcdef | 2.2 |
| UA504 | yzab | 8.8 |
| UA504 | lmnop | 2.8 |
| UA503 | uvwx | 8.6 |
| UA503 | ghijk | 2.6 |
期望的输出是:
| id | seq | score |
| ----- | ------ | ----- |
| UA502 | qrst | 8.2 |
| UA504 | yzab | 8.8 |
| UA503 | uvwx | 8.6 |
我 运行 在 WITH 查询 (max_calc) 之后对另一个 WITH 查询 (union_data; TABLE 1) 的输出使用 groupby 和 max 函数。 =14=]
max_calc as(
select id, seq, max(score)
from union_data
GROUP BY id
)
select * from max_calc
;
我得到的错误是:
Query Error: error: column "union_data.seq" must appear in the GROUP BY clause or be used in an aggregate function
我不明白这个错误。我根据公共 ID 而不是 seq 对数据进行分组。为什么我应该在 GROUPBY 中包含列 "union_data.seq"。
谢谢
在 Postgres 中,您可以为此使用 handy extension distinct on
:
select distinct on (id) u.*
from union_data u
order by id, score desc
GMB 的答案是三列的最佳答案(并且已正式投票)。但是,如果您想要更多聚合,可以使用数组模拟 "first" 聚合函数:
select id,
(array_agg(seq order by score desc))[1] as seq,
max(score)
from union_data
group by id;
之前给出的答案显示了如何 correct/circumvent 原始错误。然而,他们没有解决关于错误原因的实际查询。所以让 return 到原始查询。
select id, seq, max(score)
from union_data
GROUP BY id
此查询导致错误。原因是分组中省略了非聚合列 seq。当聚合函数也在列列表中时,SQL 语法规则要求 select 列表中的所有非聚合列都在 "group by" 子句中。这就是 Postgres 具有 "distinct by" 扩展名的原因。它基本上允许绕过 SQL 语法规则,但它不是免费的午餐。 Distinct By 提出了自己的要求。
SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the “first row” of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. ... The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s). The ORDER BY clause will normally contain additional expression(s) that determine the desired precedence of rows within each DISTINCT ON group.]