在 AWS Athena 中访问复杂类型

Question

我使用 Glue 为 Athena 生成表格。我有一些嵌套的 array/struct 值（复杂类型），我无法通过查询访问这些值。

我有两个表，有问题的表名为 "sample_parquet"。

 ids (array<struct<idType:string,idValue:string>>)

单元格的值为：

[{idtype=ttd_id, idvalue=cf275376-8116-4cad-a035-e241e14b1470}, {idtype=md5_email, idvalue=932babe184fb11c92b09b3e13e936124}]

我试过了：

 select ids.idtype from sample_parquet limit 1

产生：

SYNTAX_ERROR: line 1:8: Expression "ids" is not of type ROW

并且：

select s.idtype from sample_parquet.ids s limit 1;

产生：

SYNTAX_ERROR: line 1:22: Schema sample_parquet does not exist

我也试过：

select json_extract(ids, '$.idtype') as idtype from sample_parquet limit 1;

产生：

SYNTAX_ERROR: line 8:8: Unexpected parameters (array(row(idtype varchar,idvalue varchar)), varchar(8)) for function json_extract. Expected: json_extract(varchar(x), JsonPath) , json_extract(json, JsonPath)

感谢您的帮助。

Answer 1

您正试图像访问 dictionary/key-value 一样访问数组的元素。

使用 UNNEST 将数组展平，然后可以使用 . 运算符。

有关在 AWS Docs 上使用 JSON 和 ARRAY 的更多信息。

Answer 2

ids 是数组类型的列，而不是关系（例如 table、视图或子查询）。令人困惑的是，在处理 Athena/Presto 中的嵌套类型时，您必须停止从 SQL 的角度思考，而是像在编程语言中那样思考。

有专门作用于arrays, maps, as well as lambda functions的函数（与AWS服务无关），可用于深入了解嵌套类型。

当你说 SELECT ids.idtype … 时，我假设你所追求的可以写成 JavaScript 中的 ids.map((id) => id.ittype)。在 Athena/Presto 中，这可以表示为 SELECT transform(ids, id -> id.idtype) ….

transform 的结果将是与类型 array<string> 的列的关系。如果您希望该数组的每个元素作为单独的行，则需要使用 UNNEST, but if you instead want the first value you can use the element_at 函数。还有一些您可能熟悉的其他函数，例如生成新数组的 filter、slice 和 flatten，以及生成标量值的 reduce .

在 AWS Athena 中访问复杂类型

Accessing complex types in AWS Athena

sql

amazon-web-services

amazon-athena

aws-glue