我如何在 PostgreSQL 中的数组上按元素应用聚合函数,例如一个组的加权数组总和?
How can I apply aggregate functions element-wise over arrays in PostgreSQL, e.g. weighted array sums over a group?
我有一个 table 如下(参见 db<>fiddle):
grp
n
vals
0
2
{1,2,3,4}
1
5
{3,2,1,2}
1
3
{0,5,4,3}
对于每个组(由 grp
定义),我想执行一些涉及组的标量 n
和数组 vals
的算术运算。我对一种加权求和感兴趣,这样每一行的 vals
乘以它的 n
,得到的数组在每个组内按元素求和,每组输出一个数组:
grp
result
0
{2,4,6,8}
1
{15,25,17,19}
这是我试过的方法。这失败并出现错误 (aggregate function calls cannot contain set-returning function calls
):
SELECT
grp,
ARRAY(SELECT SUM(n * UNNEST(vals)))
FROM
tbl
GROUP BY
grp
该错误包含一个提示,但我无法理解它对我的用例的意义。
以下将所需数组汇总为标量:
SELECT
grp,
SUM(n * vals[i])
FROM
tbl,
generate_series(1, 4) i
GROUP BY
grp
只有这种作品:
SELECT
grp,
SUM(n * vals[1]),
SUM(n * vals[2]),
SUM(n * vals[3]),
SUM(n * vals[4])
FROM
tbl
GROUP BY
grp
但它不会产生数组,它涉及分别写出数组的每个元素。在我的例子中,数组比四个元素长得多,所以这太尴尬了。
WITH flattened AS (
SELECT grp, position, SUM(val * n) AS s
FROM tbl, unnest(vals) WITH ORDINALITY AS f(val, position)
GROUP BY grp, position
ORDER BY grp, position
)
SELECT grp, array_agg(s ORDER BY position)
FROM flattened
GROUP BY grp
;
+---+-------------------------------------------------------------------------------------+
|grp|array_agg |
+---+-------------------------------------------------------------------------------------+
|0 |{2.00000000000000000,4.00000000000000000,6.00000000000000000,8.00000000000000000} |
|1 |{15.00000000000000000,25.00000000000000000,17.00000000000000000,19.00000000000000000}|
+---+-------------------------------------------------------------------------------------+
解释:
您可以使用UNNEST ... WITH ORDINALITY
来跟踪每个值的位置:
SELECT grp, position, val, n
FROM tbl, unnest(vals) WITH ORDINALITY AS f(val, position);
+---+--------+---+-+
|grp|position|val|n|
+---+--------+---+-+
|0 |1 |1 |2|
|0 |2 |2 |2|
|0 |3 |3 |2|
|0 |4 |4 |2|
|1 |1 |3 |5|
|1 |2 |2 |5|
|1 |3 |1 |5|
|1 |4 |2 |5|
|1 |1 |0 |3|
|1 |2 |5 |3|
|1 |3 |4 |3|
|1 |4 |3 |3|
+---+--------+---+-+
然后GROUP BY
原组和各位置:
SELECT grp, position, SUM(val * n) AS s
FROM tbl, unnest(vals) WITH ORDINALITY AS f(val, position)
GROUP BY grp, position
ORDER BY grp, position;
+---+--------+--+
|grp|position|s |
+---+--------+--+
|0 |1 |2 |
|0 |2 |4 |
|0 |3 |6 |
|0 |4 |8 |
|1 |1 |15|
|1 |2 |25|
|1 |3 |17|
|1 |4 |19|
+---+--------+--+
那么你只需要答案中的ARRAY_AGG
。
我会为此编写函数,否则 SQL 会变得非常混乱。
一个函数将所有元素与给定值相乘:
create function array_mul(p_input real[], p_mul int)
returns real[]
as
$$
select array(select i * p_mul
from unnest(p_input) with ordinality as t(i,idx)
order by idx);
$$
language sql
immutable;
还有一个函数用作汇总具有相同索引的元素的聚合:
create or replace function array_add(p_one real[], p_two real[])
returns real[]
as
$$
declare
l_idx int;
l_result real[];
begin
if p_one is null or p_two is null then
return coalesce(p_one, p_two);
end if;
for l_idx in 1..greatest(cardinality(p_one), cardinality(p_two)) loop
l_result[l_idx] := coalesce(p_one[l_idx],0) + coalesce(p_two[l_idx], 0);
end loop;
return l_result;
end;
$$
language plpgsql
immutable;
可用于定义自定义聚合:
create aggregate array_element_sum(real[]) (
sfunc = array_add,
stype = real[],
initcond = '{}'
);
然后您的查询就这么简单:
select grp, array_element_sum(array_mul(vals, n))
from tbl
group by grp;
我有一个 table 如下(参见 db<>fiddle):
grp | n | vals |
---|---|---|
0 | 2 | {1,2,3,4} |
1 | 5 | {3,2,1,2} |
1 | 3 | {0,5,4,3} |
对于每个组(由 grp
定义),我想执行一些涉及组的标量 n
和数组 vals
的算术运算。我对一种加权求和感兴趣,这样每一行的 vals
乘以它的 n
,得到的数组在每个组内按元素求和,每组输出一个数组:
grp | result |
---|---|
0 | {2,4,6,8} |
1 | {15,25,17,19} |
这是我试过的方法。这失败并出现错误 (aggregate function calls cannot contain set-returning function calls
):
SELECT
grp,
ARRAY(SELECT SUM(n * UNNEST(vals)))
FROM
tbl
GROUP BY
grp
该错误包含一个提示,但我无法理解它对我的用例的意义。
以下将所需数组汇总为标量:
SELECT
grp,
SUM(n * vals[i])
FROM
tbl,
generate_series(1, 4) i
GROUP BY
grp
只有这种作品:
SELECT
grp,
SUM(n * vals[1]),
SUM(n * vals[2]),
SUM(n * vals[3]),
SUM(n * vals[4])
FROM
tbl
GROUP BY
grp
但它不会产生数组,它涉及分别写出数组的每个元素。在我的例子中,数组比四个元素长得多,所以这太尴尬了。
WITH flattened AS (
SELECT grp, position, SUM(val * n) AS s
FROM tbl, unnest(vals) WITH ORDINALITY AS f(val, position)
GROUP BY grp, position
ORDER BY grp, position
)
SELECT grp, array_agg(s ORDER BY position)
FROM flattened
GROUP BY grp
;
+---+-------------------------------------------------------------------------------------+
|grp|array_agg |
+---+-------------------------------------------------------------------------------------+
|0 |{2.00000000000000000,4.00000000000000000,6.00000000000000000,8.00000000000000000} |
|1 |{15.00000000000000000,25.00000000000000000,17.00000000000000000,19.00000000000000000}|
+---+-------------------------------------------------------------------------------------+
解释:
您可以使用UNNEST ... WITH ORDINALITY
来跟踪每个值的位置:
SELECT grp, position, val, n
FROM tbl, unnest(vals) WITH ORDINALITY AS f(val, position);
+---+--------+---+-+
|grp|position|val|n|
+---+--------+---+-+
|0 |1 |1 |2|
|0 |2 |2 |2|
|0 |3 |3 |2|
|0 |4 |4 |2|
|1 |1 |3 |5|
|1 |2 |2 |5|
|1 |3 |1 |5|
|1 |4 |2 |5|
|1 |1 |0 |3|
|1 |2 |5 |3|
|1 |3 |4 |3|
|1 |4 |3 |3|
+---+--------+---+-+
然后GROUP BY
原组和各位置:
SELECT grp, position, SUM(val * n) AS s
FROM tbl, unnest(vals) WITH ORDINALITY AS f(val, position)
GROUP BY grp, position
ORDER BY grp, position;
+---+--------+--+
|grp|position|s |
+---+--------+--+
|0 |1 |2 |
|0 |2 |4 |
|0 |3 |6 |
|0 |4 |8 |
|1 |1 |15|
|1 |2 |25|
|1 |3 |17|
|1 |4 |19|
+---+--------+--+
那么你只需要答案中的ARRAY_AGG
。
我会为此编写函数,否则 SQL 会变得非常混乱。
一个函数将所有元素与给定值相乘:
create function array_mul(p_input real[], p_mul int)
returns real[]
as
$$
select array(select i * p_mul
from unnest(p_input) with ordinality as t(i,idx)
order by idx);
$$
language sql
immutable;
还有一个函数用作汇总具有相同索引的元素的聚合:
create or replace function array_add(p_one real[], p_two real[])
returns real[]
as
$$
declare
l_idx int;
l_result real[];
begin
if p_one is null or p_two is null then
return coalesce(p_one, p_two);
end if;
for l_idx in 1..greatest(cardinality(p_one), cardinality(p_two)) loop
l_result[l_idx] := coalesce(p_one[l_idx],0) + coalesce(p_two[l_idx], 0);
end loop;
return l_result;
end;
$$
language plpgsql
immutable;
可用于定义自定义聚合:
create aggregate array_element_sum(real[]) (
sfunc = array_add,
stype = real[],
initcond = '{}'
);
然后您的查询就这么简单:
select grp, array_element_sum(array_mul(vals, n))
from tbl
group by grp;