是否可以在 BigQuery 中对嵌套表进行分区?
Is it possible to partition nested tables in BigQuery?
我目前正在将我的数据仓库迁移到 BigQuery。我一直在尝试对数据库进行非规范化,正如我所读到的那样,它可以产生更高效、更便宜的查询。然而,这导致了一些嵌套的 tables。如果每个嵌套的 table 都有一个列“created_at”和“last_modified_at”,有没有什么方法可以使用这些值中的任何一个来划分我的 tables?
不,您不能通过嵌套的 table 对 table 进行分区。根据the docs:
You can partition BigQuery tables by:
Time-unit column: Tables are partitioned based on a TIMESTAMP, DATE,
or DATETIME column in the table.
Ingestion time: Tables are partitioned based on the timestamp when
BigQuery ingests the data.
Integer range: Tables are partitioned based on an integer column.
此外,分区必须是顶级字段,不能是 RECORD (STRUCT) 的叶字段:
Limitations
You cannot use legacy SQL to query partitioned tables or
to write query results to partitioned tables.
Time-unit column-partitioned tables are subject to the following
limitations:
The partitioning column must be either a scalar DATE, TIMESTAMP, or
DATETIME column. While the mode of the column can be REQUIRED or
NULLABLE, it cannot be REPEATED (array-based). The partitioning column
must be a top-level field. You cannot use a leaf field from a RECORD
(STRUCT) as the partitioning column.
Integer-range partitioned tables
are subject to the following limitations:
The partitioning column must be an INTEGER column. While the mode of
the column may be REQUIRED or NULLABLE, it cannot beREPEATED
(array-based). The partitioning column must be a top-level field. You
cannot use a leaf field from a RECORD (STRUCT) as the partitioning
column.
虽然您可以在 BigQuery 中将更多数据类型与 tables 聚集在一起,但您不能使用 RECORD (STRUCT)
列来聚集 tables:
Clustering columns must be top-level, non-repeated columns of one of
the following types:
DATE BOOL GEOGRAPHY INT64 NUMERIC BIGNUMERIC STRING TIMESTAMP DATETIME
如果您进行分区的原因是为了提高 date/time 查询的效率,并且如果每个嵌套的 table 涵盖相似的时间范围,我建议将 table 取消嵌套到parent table。如果您不想取消 table 的嵌套,将另一列添加到您的主 table 中可能会有所帮助,其中包含嵌套 table 中最早或最晚的日期并按新的分区列。
我目前正在将我的数据仓库迁移到 BigQuery。我一直在尝试对数据库进行非规范化,正如我所读到的那样,它可以产生更高效、更便宜的查询。然而,这导致了一些嵌套的 tables。如果每个嵌套的 table 都有一个列“created_at”和“last_modified_at”,有没有什么方法可以使用这些值中的任何一个来划分我的 tables?
不,您不能通过嵌套的 table 对 table 进行分区。根据the docs:
You can partition BigQuery tables by:
Time-unit column: Tables are partitioned based on a TIMESTAMP, DATE, or DATETIME column in the table.
Ingestion time: Tables are partitioned based on the timestamp when BigQuery ingests the data.
Integer range: Tables are partitioned based on an integer column.
此外,分区必须是顶级字段,不能是 RECORD (STRUCT) 的叶字段:
Limitations
You cannot use legacy SQL to query partitioned tables or to write query results to partitioned tables.
Time-unit column-partitioned tables are subject to the following limitations:
The partitioning column must be either a scalar DATE, TIMESTAMP, or DATETIME column. While the mode of the column can be REQUIRED or NULLABLE, it cannot be REPEATED (array-based). The partitioning column must be a top-level field. You cannot use a leaf field from a RECORD (STRUCT) as the partitioning column.
Integer-range partitioned tables are subject to the following limitations:
The partitioning column must be an INTEGER column. While the mode of the column may be REQUIRED or NULLABLE, it cannot beREPEATED (array-based). The partitioning column must be a top-level field. You cannot use a leaf field from a RECORD (STRUCT) as the partitioning column.
虽然您可以在 BigQuery 中将更多数据类型与 tables 聚集在一起,但您不能使用 RECORD (STRUCT)
列来聚集 tables:
Clustering columns must be top-level, non-repeated columns of one of the following types:
DATE BOOL GEOGRAPHY INT64 NUMERIC BIGNUMERIC STRING TIMESTAMP DATETIME
如果您进行分区的原因是为了提高 date/time 查询的效率,并且如果每个嵌套的 table 涵盖相似的时间范围,我建议将 table 取消嵌套到parent table。如果您不想取消 table 的嵌套,将另一列添加到您的主 table 中可能会有所帮助,其中包含嵌套 table 中最早或最晚的日期并按新的分区列。