为 PostgreSQL 中的 table 中的列计算 for 循环中的平均值

Question

我来自Python的世界，那里的很多事情都是五颜六色的，也很容易。现在我正在努力进入 SQL，因为好吧，我想在 pandas 之外挑战自己，并在 SQL 中获得重要的经验。也就是说，我有以下问题。我有以下片段：

do 
$do$
declare i varchar(50); 
declare average int; 
begin
    for i in (
        select column_name
        FROM information_schema.columns
        where table_schema = 'public'
        and table_name = 'example_table' 
        and column_name like '%suffix') loop 
            --raise notice 'Value: %', i; 
            select AVG(i) as average from example_table; 
            raise notice 'Value: %', i;
        end loop; 
end; 
$do$

正如我在 SQL 的文档中了解到的那样，我发现 for 循环只能在 do 块中使用，并且必须声明某些变量。我为 i 变量执行了此操作，该变量包含我要迭代的列的名称。但我想获取该列的平均值并将其添加为 table 中的一行，其中两列用于特征（i 变量），以及该列的平均值。我认为我上面的代码片段可以做到这一点，但我收到一条错误消息，指出 Function avg(character varying) does not exist。当我在 for 循环之外为单个列使用函数 AVG 时，它确实检索了该数字列的平均值，但是当我在 for 循环中执行此操作时，它表示此聚合函数不存在。有人可以帮我解决这个问题吗？

更新：我退后一步，试图让故事更短：

select column_name
        FROM information_schema.columns
        where table_schema = 'public'
        and table_name = 'my_table' 
        and column_name like '%wildcard';

此代码段生成一个 table，其中包含一个名为 column_name 的列以及所有满足 where 语句中规定的约束的列。我只想添加一个包含这些列的平均值的列。

Answer 1

如果你只需要一个table，你可以使用：

select x.col, avg(x.value::numeric)
from example_table t
 cross join lateral (
    select col, value
    from jsonb_each(to_jsonb(t)) as e(col, value)
    where jsonb_typeof(e.value) = 'number'
 ) x
group by x.col;

“魔法”在于将 table 中的每一行转换为 JSON 值。这就是 to_jsonb(t) 所做的（t 是主查询中 table 的别名）。所以我们得到类似 {"name": "Bla", "value": 3.14, "length": 10, "some_date": "2022-03-02"} 的东西。所以每个列名都是 JSON 值中的一个键。

然后使用 jsonb_each() 函数将此 json 转换为每列（=键）一行，但仅保留具有数字值的行（=列）。因此导出的 table returns 每列一行，在 table 中每行一行。外部查询然后简单地按列聚合。缺点是，您需要为每个 table.

编写一个查询

如果您需要某种模式中所有 table 的报告，您可以使用 this answer

的变体

with all_selects as (
  select table_schema, table_name, 'select '||string_agg(format('avg(%I) as %I', column_name, column_name), ', ')||format(' from %I.%I', table_schema, table_name) as query
  from information_schema.columns
  where table_schema = 'public'
    and data_type in ('bigint', 'integer', 'double precision', 'smallint', 'numeric', 'real')
  group by table_schema, table_name
), all_aggregates as (
   select table_schema, table_name, 
          query_to_xml(query, true, true, '') as result
   from all_selects
)
select ag.table_schema, ag.table_name, r.column_name, nullif(r.average, '')::numeric as average
from all_aggregates ag
  cross join xmltable('/row/*' passing result
     columns column_name text path 'local-name()', 
             average text path '.') as r

这有点棘手。第一部分 all_selects 为架构 public 中的每个 table 构建一个查询，以将 avg() 聚合应用于每个可以包含数字的列 (where data type in (...))

例如这 return 是一个字符串 select avg(value) as value, avg(length) as length from example_table

下一步是运行这些查询都通过query_to_xml()（遗憾的是没有built-inquery_to_jsonb()） .

query_to_xml() 会 return 类似于：

<row xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <value>12.345</balance>
  <length>42</user_id>
</row>

所以每一列一个标签（这是 avg(..) 函数的结果）。

最后的 select 然后使用 xmltable() 将每个标签从 XML 结果变成一行 return 列名和值

Online example

当然你也可以在 PL/pgSQL 中这样做：

do 
$do$
declare 
  l_rec record;
  l_sql text;
  l_average numeric;
begin
    for l_rec in 
        select table_schema, table_name, column_name
        from information_schema.columns
        where table_schema = 'public'
          and data_type in ('bigint', 'integer', 'double precision', 'smallint', 'numeric', 'real')
    loop 
      l_sql := format('select avg(%I) from %I.%I', l_rec.column_name, l_rec.table_schema, l_rec.table_name);
      execute l_sql
         into l_average;
      raise notice 'Average for %.% is: %', l_rec.table_name, l_rec.column_name, l_average;
    end loop; 
end; 
$do$

注意列 data_type 上的条件以仅处理可以平均的列。然而，这是更昂贵的，因为它每列运行一个查询，而不是每 table.

为 PostgreSQL 中的 table 中的列计算 for 循环中的平均值

Calculate Avg in for loop for columns in a table in PostgreSQL

sql

postgresql

average