如何为每个分区生成一个文件 - Snowflake COPY into location
How to generate a single file per partition - Snowflake COPY into location
我已经设法将我的数据卸载到一个分区中,但每个分区也被分区到多个文件中。有没有办法强制 Snowflake 为每个分区生成一个文件?
如果我能把所有的文件都压缩就好了。
这是我目前得到的:
COPY INTO 'gcs_bucket'
FROM test
PARTITION BY TRUNC(number_of_rows/500000)
STORAGE_INTEGRATION = gcs_int
FILE_FORMAT = (TYPE = CSV, COMPRESSION = gzip, NULL_IF = ('NULL','null'), RECORD_DELIMITER= '\r\n', FIELD_OPTIONALLY_ENCLOSED_BY = "'")
HEADER = TRUE
PS。我使用的是 csv 格式(无法更改)
可以使用 MAX_FILE_SIZE
选项更改每个文件的大小上限。默认为 16MB。
COPY INTO 'gcs_bucket'
FROM test
PARTITION BY TRUNC(number_of_rows/500000)
STORAGE_INTEGRATION = gcs_int
...
MAX_FILE_SIZE = 167772160 -- (160MB)
Definition
Number (> 0) that specifies the upper size limit (in bytes) of each file to be generated in parallel per thread. Note that the actual file size and number of files unloaded are determined by the total amount of data and number of nodes available for parallel processing.
Snowflake utilizes parallel execution to optimize performance. The number of threads cannot be modified.**
我已经设法将我的数据卸载到一个分区中,但每个分区也被分区到多个文件中。有没有办法强制 Snowflake 为每个分区生成一个文件?
如果我能把所有的文件都压缩就好了。
这是我目前得到的:
COPY INTO 'gcs_bucket'
FROM test
PARTITION BY TRUNC(number_of_rows/500000)
STORAGE_INTEGRATION = gcs_int
FILE_FORMAT = (TYPE = CSV, COMPRESSION = gzip, NULL_IF = ('NULL','null'), RECORD_DELIMITER= '\r\n', FIELD_OPTIONALLY_ENCLOSED_BY = "'")
HEADER = TRUE
PS。我使用的是 csv 格式(无法更改)
可以使用 MAX_FILE_SIZE
选项更改每个文件的大小上限。默认为 16MB。
COPY INTO 'gcs_bucket'
FROM test
PARTITION BY TRUNC(number_of_rows/500000)
STORAGE_INTEGRATION = gcs_int
...
MAX_FILE_SIZE = 167772160 -- (160MB)
Definition
Number (> 0) that specifies the upper size limit (in bytes) of each file to be generated in parallel per thread. Note that the actual file size and number of files unloaded are determined by the total amount of data and number of nodes available for parallel processing.
Snowflake utilizes parallel execution to optimize performance. The number of threads cannot be modified.**