Snowflake:SQS-SNS 能否为 COPY INTO 提供精细路径?

Snowflake: can an SQS-SNS provide granular path for COPY INTO?

我正在将数据从 S3 文件夹加载到 Snowflake,它也有很多子文件夹。由于设计限制,我无法更改文件夹结构或删除加载的文件。在阅读一些 best practices 的 ELT 时,他们建议将数据加载到这样的粒度路径中:

-- Simple method:  Scan the entire stage
copy into sales_table
  from @landing_data
  pattern='.*[.]csv';

-- Most Flexible method:  Limit within directory
copy into sales_table
  from @landing_data/sales/transactions/2020/05
  pattern='.*[.]csv';

-- Fastest method:  A named file
copy into sales_table
  from @landing_data/sales/transactions/2020/05/sales_050.csv;

然而,如上所述,我最好的只有 @landing_data/sales/transactions,它会根据日期增长,并使性能随着时间的推移而下降。在阅读 guide to use SNS topic 时,它表示:

Note that the pipe will only copy files to the ingest queue triggered by event notifications via the SNS topic.

我有一些问题:

If I understand correctly, it means that SNS will provide the path of that file for Snowpipe, which makes the loading process already use a granular path?

正确。来自 https://docs.snowflake.com/en/user-guide/data-load-snowpipe-auto-s3.html#step-3-create-a-pipe-with-auto-ingest-enabled:

Data files are loaded in a stage.

An S3 event notification informs Snowpipe via an SQS queue that files are ready to load. Snowpipe copies the files into a queue.

A Snowflake-provided virtual warehouse loads data from the queued files into the target table based on parameters defined in the

specified pipe.

它是“从排队的文件中加载数据”,表明您在这里要查找的内容。这使 Snowpipe 不必列出文件夹的内容(这是导致非粒度路径性能问题的主要原因)。

请注意,为此您不需要 Snowpipe - COPY INTO 具有 FILES 选项,可让您指定单个文件。