限制 Amazon Redshift 中架构的大小
Restricting the size of a schema in Amazon Redshift
我们在项目中使用 Amazon Redshift。
在我们的项目中,我们为不同的团队分配了不同的模式。例如,市场营销获得一个单独的架构来存储他们的表以供分析,而销售团队获得一个单独的架构。
正在发生的事情是,来自一组的分析师用完了数据库的大部分 space,这些表本质上更临时并且不关心删除它 it/purge。因此,维护自己的模式的纪律留给了各个模式所有者。时不时地,我们最终会做一些家务活动。
我想知道我们是否可以根据 schema/database 配置大小。比方说,我们分配 100 GB 给销售模式,50 GB 给市场营销等等...
根据 Redshift 文档,Redshift 似乎没有提供限制每个 schema/database 大小的功能,但有一个解决方法。
由于您可以通过以下查询获得每个 table 的数据大小,您可以编写一个脚本来监视它们的使用情况并在超过时发送警报。然后,只需 运行 脚本定期通过 cron。
- 查询以获取每个 table
的数据大小和行数
select
trim(pgdb.datname) as database, trim(pgn.nspname) as schema,
trim(a.name) as Table, b.mbytes, a.rows
from
(select db_id, id, name, sum(rows) as rows from stv_tbl_perm a group by db_id, id, name) as a
join pg_class as pgc on pgc.oid = a.id
join pg_namespace as pgn on pgn.oid = pgc.relnamespace
join pg_database as pgdb on pgdb.oid = a.db_id
join (select tbl, count(*) as mbytes from stv_blocklist group by tbl) b on a.id=b.tbl
order by 1, 2, 3;
- ex) 结果
database | schema | table | mbytes | rows
----------+---------------+-------------+--------+----------+
test_db | dev_schmea_1 | click_log | 23 | 4653
prod_db | prod_schema_1 | click_log | 16217 | 2112354
prod_db | prod_schema_1 | install_log | 5544 | 433538
- 查询以获取每个模式的数据大小和行数
select
trim(pgdb.datname) as database, trim(pgn.nspname) as schema,
sum(b.mbytes) as mbytes, sum(a.rows) as rows
from
(select db_id, id, name, sum(rows) as rows from stv_tbl_perm a group by db_id, id, name) as a
join pg_class as pgc on pgc.oid = a.id
join pg_namespace as pgn on pgn.oid = pgc.relnamespace
join pg_database as pgdb on pgdb.oid = a.db_id
join (select tbl, count(*) as mbytes from stv_blocklist group by tbl) b on a.id=b.tbl
group by pgdb.datname, pgn.nspname
order by 1, 2;
- 查询以获取每个数据库的数据大小和行数
select
trim(pgdb.datname) as database, sum(b.mbytes) as mbytes, sum(a.rows) as rows
from
(select db_id, id, name, sum(rows) as rows from stv_tbl_perm a group by db_id, id, name) as a
join pg_class as pgc on pgc.oid = a.id
join pg_namespace as pgn on pgn.oid = pgc.relnamespace
join pg_database as pgdb on pgdb.oid = a.db_id
join (select tbl, count(*) as mbytes from stv_blocklist group by tbl) b on a.id=b.tbl
group by pgdb.datname
order by 1;
Redshift 中现在存在此功能:
Redshift Create Schema Docs
文档中的相关示例:
create schema us_sales authorization dwuser QUOTA 50 GB;
我们在项目中使用 Amazon Redshift。
在我们的项目中,我们为不同的团队分配了不同的模式。例如,市场营销获得一个单独的架构来存储他们的表以供分析,而销售团队获得一个单独的架构。
正在发生的事情是,来自一组的分析师用完了数据库的大部分 space,这些表本质上更临时并且不关心删除它 it/purge。因此,维护自己的模式的纪律留给了各个模式所有者。时不时地,我们最终会做一些家务活动。
我想知道我们是否可以根据 schema/database 配置大小。比方说,我们分配 100 GB 给销售模式,50 GB 给市场营销等等...
根据 Redshift 文档,Redshift 似乎没有提供限制每个 schema/database 大小的功能,但有一个解决方法。
由于您可以通过以下查询获得每个 table 的数据大小,您可以编写一个脚本来监视它们的使用情况并在超过时发送警报。然后,只需 运行 脚本定期通过 cron。
- 查询以获取每个 table 的数据大小和行数
select
trim(pgdb.datname) as database, trim(pgn.nspname) as schema,
trim(a.name) as Table, b.mbytes, a.rows
from
(select db_id, id, name, sum(rows) as rows from stv_tbl_perm a group by db_id, id, name) as a
join pg_class as pgc on pgc.oid = a.id
join pg_namespace as pgn on pgn.oid = pgc.relnamespace
join pg_database as pgdb on pgdb.oid = a.db_id
join (select tbl, count(*) as mbytes from stv_blocklist group by tbl) b on a.id=b.tbl
order by 1, 2, 3;
- ex) 结果
database | schema | table | mbytes | rows
----------+---------------+-------------+--------+----------+
test_db | dev_schmea_1 | click_log | 23 | 4653
prod_db | prod_schema_1 | click_log | 16217 | 2112354
prod_db | prod_schema_1 | install_log | 5544 | 433538
- 查询以获取每个模式的数据大小和行数
select
trim(pgdb.datname) as database, trim(pgn.nspname) as schema,
sum(b.mbytes) as mbytes, sum(a.rows) as rows
from
(select db_id, id, name, sum(rows) as rows from stv_tbl_perm a group by db_id, id, name) as a
join pg_class as pgc on pgc.oid = a.id
join pg_namespace as pgn on pgn.oid = pgc.relnamespace
join pg_database as pgdb on pgdb.oid = a.db_id
join (select tbl, count(*) as mbytes from stv_blocklist group by tbl) b on a.id=b.tbl
group by pgdb.datname, pgn.nspname
order by 1, 2;
- 查询以获取每个数据库的数据大小和行数
select
trim(pgdb.datname) as database, sum(b.mbytes) as mbytes, sum(a.rows) as rows
from
(select db_id, id, name, sum(rows) as rows from stv_tbl_perm a group by db_id, id, name) as a
join pg_class as pgc on pgc.oid = a.id
join pg_namespace as pgn on pgn.oid = pgc.relnamespace
join pg_database as pgdb on pgdb.oid = a.db_id
join (select tbl, count(*) as mbytes from stv_blocklist group by tbl) b on a.id=b.tbl
group by pgdb.datname
order by 1;
Redshift 中现在存在此功能: Redshift Create Schema Docs
文档中的相关示例:
create schema us_sales authorization dwuser QUOTA 50 GB;