unnest() 未爆炸数组,returns 错误列别名列表有 1 个条目但 't' 有 2 列可用

unnest() not exploding array, returns error Column alias list has 1 entries but 't' has 2 columns available

我有一些 json 数据,其中包括 属性 'characters',它看起来像这样:

select json_data['characters'] from latest_snapshot_events

Returns: [{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":60,"CHAR_A3_LVL":1,"CHAR_TIER":1,"ITEM":10,"shards":0,"CHAR_TPIECES":0,"CHAR_A5_LVL":0,"CHAR_A2_LVL":1,"CHAR_A4_LVL":1,"ITEM_CATEGORY":"Character","ITEM_LEVEL":3},{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":50,"CHAR_A3_LVL":1,"CHAR_TIER":1,"ITEM":39,"shards":0,"CHAR_TPIECES":0,"CHAR_A5_LVL":0,"CHAR_A2_LVL":1,"CHAR_A4_LVL":1,"ITEM_CATEGORY":"Character","ITEM_LEVEL":2},{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":80,"CHAR_A3_LVL":1,"CHAR_TIER":1,"ITEM":6801450488388220,"shards":0,"CHAR_TPIECES":0,"CHAR_A5_LVL":1,"CHAR_A2_LVL":1,"CHAR_A4_LVL":1,"ITEM_CATEGORY":"Character","ITEM_LEVEL":4},{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":85,"CHAR_A3_LVL":1,"CHAR_TIER":1,"ITEM":8355588830097610,"shards":0,"CHAR_TPIECES":5,"CHAR_A5_LVL":0,"CHAR_A2_LVL":1,"CHAR_A4_LVL":1,"ITEM_CATEGORY":"Character","ITEM_LEVEL":4}]

这是 return 在一行中编辑的。我想要数组中的每个项目一行。

我发现一些 SO 帖子和其他博客建议我使用 unnest()。我已经尝试过几次,但无法得到 return 的结果。例如,here is the documentation from presto。底部覆盖 unnest 作为 hive 侧面视图的替代品 explode:

SELECT student, score
FROM tests
CROSS JOIN UNNEST(scores) AS t (score);

所以我尝试将其应用到我的 table:

characters as (
select
  jdata.characters
from latest_snapshot_events
cross join unnest(json_data) as t(jdata)
)
select * from characters;

其中 json_data 是 latest_snapshot_events 中包含 属性 'characters' 的字段,它是一个如上所示的数组。

这 return 是一个错误:

[Simba]AthenaJDBC An error has been thrown from the AWS Athena client. SYNTAX_ERROR: line 69:12: Column alias list has 1 entries but 't' has 2 columns available

如何 unnest/explode latest_snapshot_events.json_data['characters'] 到多行?

由于 characters 是文本表示中的 JSON 数组,您必须:

  1. json_parse to produce a value of type JSON 解析 JSON 文本。
  2. 使用 CAST.
  3. 将 JSON 值转换为 SQL 数组
  4. 使用UNNEST展开数组。

例如:

WITH data(characters) AS (
    VALUES '[{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":60,"CHAR_A3_LVL":1,"CHAR_TIER":1,"ITEM":10,"shards":0,"CHAR_TPIECES":0,"CHAR_A5_LVL":0,"CHAR_A2_LVL":1,"CHAR_A4_LVL":1,"ITEM_CATEGORY":"Character","ITEM_LEVEL":3},{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":50,"CHAR_A3_LVL":1,"CHAR_TIER":1,"ITEM":39,"shards":0,"CHAR_TPIECES":0,"CHAR_A5_LVL":0,"CHAR_A2_LVL":1,"CHAR_A4_LVL":1,"ITEM_CATEGORY":"Character","ITEM_LEVEL":2},{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":80,"CHAR_A3_LVL":1,"CHAR_TIER":1,"ITEM":6801450488388220,"shards":0,"CHAR_TPIECES":0,"CHAR_A5_LVL":1,"CHAR_A2_LVL":1,"CHAR_A4_LVL":1,"ITEM_CATEGORY":"Character","ITEM_LEVEL":4},{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":85,"CHAR_A3_LVL":1,"CHAR_TIER":1,"ITEM":8355588830097610,"shards":0,"CHAR_TPIECES":5,"CHAR_A5_LVL":0,"CHAR_A2_LVL":1,"CHAR_A4_LVL":1,"ITEM_CATEGORY":"Character","ITEM_LEVEL":4}]'
)
SELECT entry
FROM data, UNNEST(CAST(json_parse(characters) AS array(json))) t(entry)

产生:

                               entry
-----------------------------------------------------------------------
 {"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":60,"CHAR_A3_LVL":1,...
 {"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":50,"CHAR_A3_LVL":1,...
 {"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":80,"CHAR_A3_LVL":1,...
 {"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":85,"CHAR_A3_LVL":1,...

在上面的示例中,我将 JSON 值转换为 array(json),但是 你可以进一步将它转换为更具体的东西,如果每个 数组条目具有常规模式。例如,对于您的数据,它是 可以将其转换为 array(map(varchar, json)),因为 array 是一个 JSON 对象。

如果您的初始数据是 JSON 字符串,则

json_parse 有效。但是,对于 array(row) 类型(即 objects/dictionaries 的数组),转换为 array(json) 会将每一行转换为数组,从对象中删除所有键并阻止您使用点表示法或json_extract 函数。

要取消嵌套array(row)数据,语法要简单得多:

CROSS JOIN UNNEST(my_array) AS my_row

我在尝试逆透视数据时遇到了这个错误。

这可能对某人有帮助:

SELECT a_col, b_col
FROM
(
SELECT MAP(
        ARRAY['a', 'b', 'c', 'd'],
        ARRAY[1, 2, 3, 4]
       ) my_col
) CROSS JOIN UNNEST(my_col) as t(a_col, b_col)

t() 允许您将多个列定义为输出。