Amazon Athena 在解析嵌套 JSON 时出现内部错误

Amazon Athena gives Internal Error while parsing nested JSON

我正在尝试查询此 JSON 文件(出于调试目的,它只包含一行!):

{
  "appVersion": null,
  "sessionIndex": "3",
  "psdkLang": null,
  "lamdbaAwsRequestId": "bb04330c-e1e7-4bbd-97b8-86fdb2ee0b7f",
  "bundleID": "xyz",
  "receiveTimestamp": "2017-03-31T01:45:30.796Z",
  "type": "logEvent",
  "userIdfv": null,
  "osVersion": null,
  "uniqueIndex": "9c6c3927-aa66-4974-adac-fd10fc83a1e5",
  "userIdfa": null,
  "eventName": "Rewarded Ads Ad Is Ready",
  "deviceType": null,
  "eventId": "shardId-000000000005:49571690399037302251611429510623174446442870333536993362",
  "store1": "google",
  "deviceLang": null,
  "geoCode": null,
  "sessionId": "34B4CEC8-9AA0-40DD-94C4-C5420F563F68",
  "params": "{\"AdProvider\":\"AdColony\",\"AdIsReady\":\"false\"}",
  "gameVersion": null,
  "internetConnectionState": null,
  "deviceModel": null,
  "deviceTimeZone": null,
  "time": "2017-03-31T10:44:50.117+0900",
  "userId": "24176983"
}

我在 Amazon Athena 中创建了一个 table:

CREATE EXTERNAL TABLE IF NOT EXISTS RV_QA.RAAIR (
  `appversion` string,
  `psdklang` string,
  `bundleid` string,
  `receivetimestamp` string,
  `type` string,
  `osversion` string,
  `store1` string,
  `devicelang` string,
  `geocode` string,
  `sessionid` string,
  `eventName` string,
  `params` map<string,string>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = '1'  
) LOCATION 's3://...'
TBLPROPERTIES ('has_encrypted_data'='false');

当我运行这个查询时:
select eventname from RAAIR;
一切正常。

当我尝试使用嵌套的 JSON(params 元素)时:
select params['AdIsReady'] from RAAIR;
我收到 "Internal error" 消息。

我在这里错过了什么?

您在评论中提到 params 包含用于转义的反斜杠。
这是因为 params 是一个字符串,而不是嵌套对象。 Athena 无法直接从字符串生成 MAP,因此您会收到 "Internal error" 消息。

如果您无法更改数据以将参数作为嵌套对象,您可以更改 table 定义,使 params 是一个字符串:

CREATE EXTERNAL TABLE IF NOT EXISTS RV_QA.RAAIR (
  ...
  `params` string
)
...

Athena (Presto) 将允许您解析字符串中的 JSON 并查询出值。
至少有两种不同的方法可以根据您的喜好解析、转换和提取值:

SELECT
  CAST(json_parse(params) as MAP(varchar, varchar))['AdIsReady'] as AdIsReady1,
  json_extract_scalar(json_parse(params), '$.AdIsReady') as AdIsReady2
FROM RV_QA.RAAIR LIMIT 10;