使用清单从多个 s3 文件夹加载 redshift table

Loading redshift table from multiple s3 folder using manifest

我正在使用复制命令从 s3 使用清单加载 Redshift table。

要求是加载多个文件(跨多个文件夹),例如

Path1 : s3://bucket_name/folder_name/folder_1/folder/part*.parquet
Path2 : s3://bucket_name/folder_name/folder_2/folder/part*.parquet
Path3 : s3://bucket_name/folder_name/folder_3/folder/part*.parquet

每个路径将有 ~1000 个文件

如何创建清单来加载它?

我创建了一个清单如下:

{
    "fileLocations": [ 
{"url":"s3://bucket_name/folder_name/folder_1/folder/part*.parquet", "mandatory":false},

 {"url":"s3://bucket_name/folder_name/folder_3/folder/part*.parquet", "mandatory":false},

 {"url":"s3://bucket_name/folder_name/folder_2/folder/part*.parquet", "mandatory":false},

 ]
}

但我收到一个错误:

Manifest does not contain a list of files.

来自Using a manifest to specify data files - Amazon Redshift

The following example shows the JSON to load files from different buckets and with file names that begin with date stamps:

{
  "entries": [
    {"url":"s3://mybucket-alpha/2013-10-04-custdata", "mandatory":true},
    {"url":"s3://mybucket-alpha/2013-10-05-custdata", "mandatory":true},
    {"url":"s3://mybucket-beta/2013-10-04-custdata", "mandatory":true},
    {"url":"s3://mybucket-beta/2013-10-05-custdata", "mandatory":true}
  ]
}

问题可能是您使用 fileLocationsentries

我也怀疑不允许使用通配符