使用清单从多个 s3 文件夹加载 redshift table
Loading redshift table from multiple s3 folder using manifest
我正在使用复制命令从 s3 使用清单加载 Redshift table。
要求是加载多个文件(跨多个文件夹),例如
Path1 : s3://bucket_name/folder_name/folder_1/folder/part*.parquet
Path2 : s3://bucket_name/folder_name/folder_2/folder/part*.parquet
Path3 : s3://bucket_name/folder_name/folder_3/folder/part*.parquet
每个路径将有 ~1000 个文件
如何创建清单来加载它?
我创建了一个清单如下:
{
"fileLocations": [
{"url":"s3://bucket_name/folder_name/folder_1/folder/part*.parquet", "mandatory":false},
{"url":"s3://bucket_name/folder_name/folder_3/folder/part*.parquet", "mandatory":false},
{"url":"s3://bucket_name/folder_name/folder_2/folder/part*.parquet", "mandatory":false},
]
}
但我收到一个错误:
Manifest does not contain a list of files.
来自Using a manifest to specify data files - Amazon Redshift:
The following example shows the JSON to load files from different buckets and with file names that begin with date stamps:
{
"entries": [
{"url":"s3://mybucket-alpha/2013-10-04-custdata", "mandatory":true},
{"url":"s3://mybucket-alpha/2013-10-05-custdata", "mandatory":true},
{"url":"s3://mybucket-beta/2013-10-04-custdata", "mandatory":true},
{"url":"s3://mybucket-beta/2013-10-05-custdata", "mandatory":true}
]
}
问题可能是您使用 fileLocations
与 entries
。
我也怀疑不允许使用通配符。
我正在使用复制命令从 s3 使用清单加载 Redshift table。
要求是加载多个文件(跨多个文件夹),例如
Path1 : s3://bucket_name/folder_name/folder_1/folder/part*.parquet
Path2 : s3://bucket_name/folder_name/folder_2/folder/part*.parquet
Path3 : s3://bucket_name/folder_name/folder_3/folder/part*.parquet
每个路径将有 ~1000 个文件
如何创建清单来加载它?
我创建了一个清单如下:
{
"fileLocations": [
{"url":"s3://bucket_name/folder_name/folder_1/folder/part*.parquet", "mandatory":false},
{"url":"s3://bucket_name/folder_name/folder_3/folder/part*.parquet", "mandatory":false},
{"url":"s3://bucket_name/folder_name/folder_2/folder/part*.parquet", "mandatory":false},
]
}
但我收到一个错误:
Manifest does not contain a list of files.
来自Using a manifest to specify data files - Amazon Redshift:
The following example shows the JSON to load files from different buckets and with file names that begin with date stamps:
{
"entries": [
{"url":"s3://mybucket-alpha/2013-10-04-custdata", "mandatory":true},
{"url":"s3://mybucket-alpha/2013-10-05-custdata", "mandatory":true},
{"url":"s3://mybucket-beta/2013-10-04-custdata", "mandatory":true},
{"url":"s3://mybucket-beta/2013-10-05-custdata", "mandatory":true}
]
}
问题可能是您使用 fileLocations
与 entries
。
我也怀疑不允许使用通配符。