如何获取特定模式中所有 table 的 select 列表中的总行数和最大(时间戳)列
How to get total row count and max(timestamp) column in select list for all table in particular schema
我们有基于 Postgres 的只读 数据库。在那里,我们在一个模式下有 52 tables。
我们正在尝试为一个模式下的所有 table 输出行数和最大(时间戳)列。
环境是:
PostgreSQL 8.2.15(Greenplum 数据库 4.2.0 build 1)(HAWQ 1.3.0.2 build 14421)在 x86_64-unknown-linux-gnu,由 GCC 编译 gcc (GCC) 4.4.2 编译
我们试穿了:
SELECT
nspname AS schemaname,relname,reltuples,max(time)
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE
nspname NOT IN ('pg_catalog', 'information_schema') AND
relkind='r'
ORDER BY reltuples DESC;
在这个查询中,我们得到了行计数列,但仍然没有达到所有 table 的 max(timestamp)。
如有任何帮助,我们将不胜感激?
您通过此查询访问的是数据库统计信息,它并非 100% 准确,可能会丢失或过时,具体取决于您的统计信息收集过程。
要获取 table 列表的行数,您必须扫描每个 table。但是,您可以使用 pg_relation_size()
了解 table 字节大小,并且此功能不需要您扫描 table.
如果您的 table 列表是静态的,您可以像这样查询:
select 'table1', count(*), max(time) from table1
union all
select 'table2', count(*), max(time) from table2
union all
...
select 'table52', count(*), max(time) from table52;
此解决方案不灵活,因为 table 列表已更改,您需要重写查询。
第二个选项是生成此查询并手动执行它:
select string_agg(query, ' union all ') as query
from (
select 'select ''' || n.nspname || '.' || c.relname || ''', count(*), max(time) from ' || n.nspname || '.' || c.relname as query
from pg_namespace as n, pg_class as c
where n.oid = c.relnamespace
and n.nspname = 'my_schema'
) as q;
这样更灵活,但是第二个查询应该手动执行。
最后是您的最后一个选择 - 为此编写一个函数:
create or replace function table_sizes (schemaname varchar) returns setof record as $BODY$
declare
r record;
t varchar;
begin
for t in execute $$
select n.nspname || '.' || c.relname
from pg_namespace as n, pg_class as c
where n.oid = c.relnamespace
and c.relkind = 'r'
and n.nspname = '$$ || schemaname || $$'$$
loop
execute 'select ''' || t || '''::varchar, count(*), max(time) from ' || t
into r;
return next r;
end loop;
return;
end;
$BODY$ language plpgsql volatile;
select * from table_sizes('public') t(tablename varchar, rowcount bigint, maxtime time);
还有一些其他步骤需要执行:
在 psql 中执行以下步骤
\o count_per_schema.sql
select 'select count(*)as '||c.relname||', max(time) from ' || n.nspname || '.' || c.relname || ';' as " " from pg_namespace as n, pg_class as c where n.oid = c.relnamespace and c.relkind='r' and n.nspname = 'schema_name';
\o
\i count_per_schema.sql
\o
会将结果重定向到您提供的 filename
。例如 count_per_schema.sql
和 \i
将 运行 来自文件的所有查询。
这是我在服务器中所做的。我没有选择 max(time).
yogesh=# \o count_per_schema.sql
yogesh=# select 'select count(*)as '||c.relname||' from ' || n.nspname || '.' || c.relname || ';' as " " from pg_namespace as n, pg_class as c where n.oid = c.relnamespace and c.relkind='r' and n.nspname = 'public';
yogesh=# \o
yogesh=# \i count_per_schema.sql
heap1
-------
20000
(1 row)
test
-------
4
(1 row)
users
-------
0
(1 row)
skew_demo
-------
10609
(1 row)
我们有基于 Postgres 的只读 数据库。在那里,我们在一个模式下有 52 tables。
我们正在尝试为一个模式下的所有 table 输出行数和最大(时间戳)列。
环境是:
PostgreSQL 8.2.15(Greenplum 数据库 4.2.0 build 1)(HAWQ 1.3.0.2 build 14421)在 x86_64-unknown-linux-gnu,由 GCC 编译 gcc (GCC) 4.4.2 编译
我们试穿了:
SELECT
nspname AS schemaname,relname,reltuples,max(time)
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE
nspname NOT IN ('pg_catalog', 'information_schema') AND
relkind='r'
ORDER BY reltuples DESC;
在这个查询中,我们得到了行计数列,但仍然没有达到所有 table 的 max(timestamp)。
如有任何帮助,我们将不胜感激?
您通过此查询访问的是数据库统计信息,它并非 100% 准确,可能会丢失或过时,具体取决于您的统计信息收集过程。
要获取 table 列表的行数,您必须扫描每个 table。但是,您可以使用 pg_relation_size()
了解 table 字节大小,并且此功能不需要您扫描 table.
如果您的 table 列表是静态的,您可以像这样查询:
select 'table1', count(*), max(time) from table1
union all
select 'table2', count(*), max(time) from table2
union all
...
select 'table52', count(*), max(time) from table52;
此解决方案不灵活,因为 table 列表已更改,您需要重写查询。
第二个选项是生成此查询并手动执行它:
select string_agg(query, ' union all ') as query
from (
select 'select ''' || n.nspname || '.' || c.relname || ''', count(*), max(time) from ' || n.nspname || '.' || c.relname as query
from pg_namespace as n, pg_class as c
where n.oid = c.relnamespace
and n.nspname = 'my_schema'
) as q;
这样更灵活,但是第二个查询应该手动执行。
最后是您的最后一个选择 - 为此编写一个函数:
create or replace function table_sizes (schemaname varchar) returns setof record as $BODY$
declare
r record;
t varchar;
begin
for t in execute $$
select n.nspname || '.' || c.relname
from pg_namespace as n, pg_class as c
where n.oid = c.relnamespace
and c.relkind = 'r'
and n.nspname = '$$ || schemaname || $$'$$
loop
execute 'select ''' || t || '''::varchar, count(*), max(time) from ' || t
into r;
return next r;
end loop;
return;
end;
$BODY$ language plpgsql volatile;
select * from table_sizes('public') t(tablename varchar, rowcount bigint, maxtime time);
还有一些其他步骤需要执行:
在 psql 中执行以下步骤
\o count_per_schema.sql
select 'select count(*)as '||c.relname||', max(time) from ' || n.nspname || '.' || c.relname || ';' as " " from pg_namespace as n, pg_class as c where n.oid = c.relnamespace and c.relkind='r' and n.nspname = 'schema_name';
\o
\i count_per_schema.sql
\o
会将结果重定向到您提供的 filename
。例如 count_per_schema.sql
和 \i
将 运行 来自文件的所有查询。
这是我在服务器中所做的。我没有选择 max(time).
yogesh=# \o count_per_schema.sql
yogesh=# select 'select count(*)as '||c.relname||' from ' || n.nspname || '.' || c.relname || ';' as " " from pg_namespace as n, pg_class as c where n.oid = c.relnamespace and c.relkind='r' and n.nspname = 'public';
yogesh=# \o
yogesh=# \i count_per_schema.sql
heap1
-------
20000
(1 row)
test
-------
4
(1 row)
users
-------
0
(1 row)
skew_demo
-------
10609
(1 row)