来自 S3 的 Redshift 副本 json 失败
Redshift copy json from S3 fails
根据 this 文档,我试图将 JSON 数据从 S3 加载到 RedShift。
创建 JSON 路径文件并验证(在 https://jsonpath.curiousconcept.com/# 上使用表达式 $.*
)
{
"jsonpaths": [
"$['_record_id']",
"$['_title']",
"$['_server_updated_at']",
"$['_project']",
"$['_assigned_to']",
"$['_updated_by']",
"$['_latitude']",
"$['_longitude']",
"$['date']",
"$['date_received']",
"$['inspection_type']"
]
}
和示例数据
[{
"_record_id": "cf68c930-b7c8-4c3f-a04c-58b49f383cca",
"_title": "FAIL, 128",
"_server_updated_at": "2021-08-03T15:06:05.000Z",
"_project": null,
"_assigned_to": null,
"_updated_by": "XYZ",
"_geometry": {
"type": "Point",
"coordinates": [-74.5048900706, 40.3395964363]
},
"_latitude": 40.3395964363,
"_longitude": -74.5048900706,
"date": "2021-08-03T00:00:00.000Z",
"date_received": "2021-07-30T00:00:00.000Z",
"inspection_type": "New Product Inspection"
}, {
"_record_id": "9c8af79a-eaaf-405e-8c42-62560fdf15d5",
"_title": "PASS, 52",
"_server_updated_at": "2021-08-03T14:56:23.000Z",
"_project": null,
"_assigned_to": null,
"_updated_by": "XYZ",
"_geometry": null,
"_latitude": null,
"_longitude": null,
"date": "2021-08-03T00:00:00.000Z",
"date_received": "2021-07-30T00:00:00.000Z",
"inspection_type": "New Product Inspection"
}]
当我运行这个COPY命令
copy rab.rab_dbo.shipmentreceivinglog2
from 's3://<bucket>/data_report.json'
iam_role 'arn:aws:iam::1234567890:role/RedshiftFileTransfer'
json 's3://<bucket>g/JSONPaths.json';
我得到ERROR: Load into table 'shipmentreceivinglog2' failed. Check 'stl_load_errors' system table for details.
当我运行select * from stl_load_errors;
我看到
Invalid JSONPath format: Member is not an object.
对于 s3://<bucket>/data_report.json
我的 JSON 路径文件有什么问题?
问题出在您的数据文件上。 Redshift json 输入数据需要是一组 json 刚刚拼凑在一起的记录。您有一个文件,它是一个 json 对象数组。数组是一回事。您需要去掉封闭的 [] 和元素之间的逗号。您的示例数据应该类似于
{
"_record_id": "cf68c930-b7c8-4c3f-a04c-58b49f383cca",
"_title": "FAIL, 128",
"_server_updated_at": "2021-08-03T15:06:05.000Z",
"_project": null,
"_assigned_to": null,
"_updated_by": "XYZ",
"_geometry": {
"type": "Point",
"coordinates": [-74.5048900706, 40.3395964363]
},
"_latitude": 40.3395964363,
"_longitude": -74.5048900706,
"date": "2021-08-03T00:00:00.000Z",
"date_received": "2021-07-30T00:00:00.000Z",
"inspection_type": "New Product Inspection"
}
{
"_record_id": "9c8af79a-eaaf-405e-8c42-62560fdf15d5",
"_title": "PASS, 52",
"_server_updated_at": "2021-08-03T14:56:23.000Z",
"_project": null,
"_assigned_to": null,
"_updated_by": "XYZ",
"_geometry": null,
"_latitude": null,
"_longitude": null,
"date": "2021-08-03T00:00:00.000Z",
"date_received": "2021-07-30T00:00:00.000Z",
"inspection_type": "New Product Inspection"
}
一个简单的方法是通过 jq 抽取您拥有的 json。
jq '.[]' file.json
根据 this 文档,我试图将 JSON 数据从 S3 加载到 RedShift。
创建 JSON 路径文件并验证(在 https://jsonpath.curiousconcept.com/# 上使用表达式 $.*
)
{
"jsonpaths": [
"$['_record_id']",
"$['_title']",
"$['_server_updated_at']",
"$['_project']",
"$['_assigned_to']",
"$['_updated_by']",
"$['_latitude']",
"$['_longitude']",
"$['date']",
"$['date_received']",
"$['inspection_type']"
]
}
和示例数据
[{
"_record_id": "cf68c930-b7c8-4c3f-a04c-58b49f383cca",
"_title": "FAIL, 128",
"_server_updated_at": "2021-08-03T15:06:05.000Z",
"_project": null,
"_assigned_to": null,
"_updated_by": "XYZ",
"_geometry": {
"type": "Point",
"coordinates": [-74.5048900706, 40.3395964363]
},
"_latitude": 40.3395964363,
"_longitude": -74.5048900706,
"date": "2021-08-03T00:00:00.000Z",
"date_received": "2021-07-30T00:00:00.000Z",
"inspection_type": "New Product Inspection"
}, {
"_record_id": "9c8af79a-eaaf-405e-8c42-62560fdf15d5",
"_title": "PASS, 52",
"_server_updated_at": "2021-08-03T14:56:23.000Z",
"_project": null,
"_assigned_to": null,
"_updated_by": "XYZ",
"_geometry": null,
"_latitude": null,
"_longitude": null,
"date": "2021-08-03T00:00:00.000Z",
"date_received": "2021-07-30T00:00:00.000Z",
"inspection_type": "New Product Inspection"
}]
当我运行这个COPY命令
copy rab.rab_dbo.shipmentreceivinglog2
from 's3://<bucket>/data_report.json'
iam_role 'arn:aws:iam::1234567890:role/RedshiftFileTransfer'
json 's3://<bucket>g/JSONPaths.json';
我得到ERROR: Load into table 'shipmentreceivinglog2' failed. Check 'stl_load_errors' system table for details.
当我运行select * from stl_load_errors;
我看到
Invalid JSONPath format: Member is not an object.
对于 s3://<bucket>/data_report.json
我的 JSON 路径文件有什么问题?
问题出在您的数据文件上。 Redshift json 输入数据需要是一组 json 刚刚拼凑在一起的记录。您有一个文件,它是一个 json 对象数组。数组是一回事。您需要去掉封闭的 [] 和元素之间的逗号。您的示例数据应该类似于
{
"_record_id": "cf68c930-b7c8-4c3f-a04c-58b49f383cca",
"_title": "FAIL, 128",
"_server_updated_at": "2021-08-03T15:06:05.000Z",
"_project": null,
"_assigned_to": null,
"_updated_by": "XYZ",
"_geometry": {
"type": "Point",
"coordinates": [-74.5048900706, 40.3395964363]
},
"_latitude": 40.3395964363,
"_longitude": -74.5048900706,
"date": "2021-08-03T00:00:00.000Z",
"date_received": "2021-07-30T00:00:00.000Z",
"inspection_type": "New Product Inspection"
}
{
"_record_id": "9c8af79a-eaaf-405e-8c42-62560fdf15d5",
"_title": "PASS, 52",
"_server_updated_at": "2021-08-03T14:56:23.000Z",
"_project": null,
"_assigned_to": null,
"_updated_by": "XYZ",
"_geometry": null,
"_latitude": null,
"_longitude": null,
"date": "2021-08-03T00:00:00.000Z",
"date_received": "2021-07-30T00:00:00.000Z",
"inspection_type": "New Product Inspection"
}
一个简单的方法是通过 jq 抽取您拥有的 json。
jq '.[]' file.json