限制 Amazon Redshift 中架构的大小

Restricting the size of a schema in Amazon Redshift

我们在项目中使用 Amazon Redshift。

在我们的项目中,我们为不同的团队分配了不同的模式。例如,市场营销获得一个单独的架构来存储他们的表以供分析,而销售团队获得一个单独的架构。

正在发生的事情是,来自一组的分析师用完了数据库的大部分 space,这些表本质上更临时并且不关心删除它 it/purge。因此,维护自己的模式的纪律留给了各个模式所有者。时不时地,我们最终会做一些家务活动。

我想知道我们是否可以根据 schema/database 配置大小。比方说,我们分配 100 GB 给销售模式,50 GB 给市场营销等等...

根据 Redshift 文档,Redshift 似乎没有提供限制每个 schema/database 大小的功能,但有一个解决方法。

由于您可以通过以下查询获得每个 table 的数据大小,您可以编写一个脚本来监视它们的使用情况并在超过时发送警报。然后,只需 运行 脚本定期通过 cron。

  • 查询以获取每个 table
  • 的数据大小和行数
select
  trim(pgdb.datname) as database, trim(pgn.nspname) as schema,
  trim(a.name) as Table, b.mbytes, a.rows
from
  (select db_id, id, name, sum(rows) as rows from stv_tbl_perm a group by db_id, id, name) as a
  join pg_class as pgc on pgc.oid = a.id
  join pg_namespace as pgn on pgn.oid = pgc.relnamespace
  join pg_database as pgdb on pgdb.oid = a.db_id
  join (select tbl, count(*) as mbytes from stv_blocklist group by tbl) b on a.id=b.tbl
order by 1, 2, 3;
  • ex) 结果
 database |     schema    |    table    | mbytes |   rows
----------+---------------+-------------+--------+----------+
 test_db  | dev_schmea_1  | click_log   |     23 |     4653
 prod_db  | prod_schema_1 | click_log   |  16217 |  2112354
 prod_db  | prod_schema_1 | install_log |   5544 |   433538
  • 查询以获取每个模式的数据大小和行数
select
  trim(pgdb.datname) as database, trim(pgn.nspname) as schema,
  sum(b.mbytes) as mbytes, sum(a.rows) as rows
from
  (select db_id, id, name, sum(rows) as rows from stv_tbl_perm a group by db_id, id, name) as a
  join pg_class as pgc on pgc.oid = a.id
  join pg_namespace as pgn on pgn.oid = pgc.relnamespace
  join pg_database as pgdb on pgdb.oid = a.db_id
  join (select tbl, count(*) as mbytes from stv_blocklist group by tbl) b on a.id=b.tbl
group by pgdb.datname, pgn.nspname
order by 1, 2;
  • 查询以获取每个数据库的数据大小和行数
select
  trim(pgdb.datname) as database, sum(b.mbytes) as mbytes, sum(a.rows) as rows
from
  (select db_id, id, name, sum(rows) as rows from stv_tbl_perm a group by db_id, id, name) as a
  join pg_class as pgc on pgc.oid = a.id
  join pg_namespace as pgn on pgn.oid = pgc.relnamespace
  join pg_database as pgdb on pgdb.oid = a.db_id
  join (select tbl, count(*) as mbytes from stv_blocklist group by tbl) b on a.id=b.tbl
group by pgdb.datname
order by 1;

Redshift 中现在存在此功能: Redshift Create Schema Docs

文档中的相关示例:

create schema us_sales authorization dwuser QUOTA 50 GB;