来自 S3 的 Redshift 副本 json 失败

Redshift copy json from S3 fails

根据 this 文档,我试图将 JSON 数据从 S3 加载到 RedShift。 创建 JSON 路径文件并验证(在 https://jsonpath.curiousconcept.com/# 上使用表达式 $.*

{
    "jsonpaths": [
        "$['_record_id']",
        "$['_title']",
        "$['_server_updated_at']",
        "$['_project']",
        "$['_assigned_to']",
        "$['_updated_by']",
        "$['_latitude']",
        "$['_longitude']",
        "$['date']",
        "$['date_received']",
        "$['inspection_type']"
    ]
}

和示例数据

[{
    "_record_id": "cf68c930-b7c8-4c3f-a04c-58b49f383cca",
    "_title": "FAIL, 128",
    "_server_updated_at": "2021-08-03T15:06:05.000Z",
    "_project": null,
    "_assigned_to": null,
    "_updated_by": "XYZ",
    "_geometry": {
        "type": "Point",
        "coordinates": [-74.5048900706, 40.3395964363]
    },
    "_latitude": 40.3395964363,
    "_longitude": -74.5048900706,
    "date": "2021-08-03T00:00:00.000Z",
    "date_received": "2021-07-30T00:00:00.000Z",
    "inspection_type": "New Product Inspection"
}, {
    "_record_id": "9c8af79a-eaaf-405e-8c42-62560fdf15d5",
    "_title": "PASS, 52",
    "_server_updated_at": "2021-08-03T14:56:23.000Z",
    "_project": null,
    "_assigned_to": null,
    "_updated_by": "XYZ",
    "_geometry": null,
    "_latitude": null,
    "_longitude": null,
    "date": "2021-08-03T00:00:00.000Z",
    "date_received": "2021-07-30T00:00:00.000Z",
    "inspection_type": "New Product Inspection"
}]

当我运行这个COPY命令

copy rab.rab_dbo.shipmentreceivinglog2
from 's3://<bucket>/data_report.json'
iam_role 'arn:aws:iam::1234567890:role/RedshiftFileTransfer'
json 's3://<bucket>g/JSONPaths.json';

我得到ERROR: Load into table 'shipmentreceivinglog2' failed. Check 'stl_load_errors' system table for details.当我运行select * from stl_load_errors;我看到

Invalid JSONPath format: Member is not an object. 对于 s3://<bucket>/data_report.json

我的 JSON 路径文件有什么问题?

问题出在您的数据文件上。 Redshift json 输入数据需要是一组 json 刚刚拼凑在一起的记录。您有一个文件,它是一个 json 对象数组。数组是一回事。您需要去掉封闭的 [] 和元素之间的逗号。您的示例数据应该类似于

{
    "_record_id": "cf68c930-b7c8-4c3f-a04c-58b49f383cca",
    "_title": "FAIL, 128",
    "_server_updated_at": "2021-08-03T15:06:05.000Z",
    "_project": null,
    "_assigned_to": null,
    "_updated_by": "XYZ",
    "_geometry": {
        "type": "Point",
        "coordinates": [-74.5048900706, 40.3395964363]
    },
    "_latitude": 40.3395964363,
    "_longitude": -74.5048900706,
    "date": "2021-08-03T00:00:00.000Z",
    "date_received": "2021-07-30T00:00:00.000Z",
    "inspection_type": "New Product Inspection"
}
{
    "_record_id": "9c8af79a-eaaf-405e-8c42-62560fdf15d5",
    "_title": "PASS, 52",
    "_server_updated_at": "2021-08-03T14:56:23.000Z",
    "_project": null,
    "_assigned_to": null,
    "_updated_by": "XYZ",
    "_geometry": null,
    "_latitude": null,
    "_longitude": null,
    "date": "2021-08-03T00:00:00.000Z",
    "date_received": "2021-07-30T00:00:00.000Z",
    "inspection_type": "New Product Inspection"
}

一个简单的方法是通过 jq 抽取您拥有的 json。

jq '.[]' file.json