将结构转换为 Spark 中的映射 SQL
Convert struct to map in Spark SQL
我正在尝试将一个声明列具有特定 struct
类型(例如 struct<x: string, y: string>
)的数据集转换为 map<string, string>
类型。我想在 SQL 中完成,可能不使用 UDF。
更新:
My requirement is also that the transformation is done generically without any prior knowledge of the struct keys (in my problem I am getting data in a complex JSON, and I don't want to keep that complexity in the schema).
示例输入数据:
WITH input (struct_col) as (
select named_struct('x', 'valX', 'y', 'valY') union all
select named_struct('x', 'valX1', 'y', 'valY2')
)
select *
from input
预期输出是 map<string, string>
类型的列
struct_col:map<string, string>
{"x":"valX","y":"valY"}
{"x":"valX1","y":"valY2"}
更新:
到目前为止,我设法找到了这个非常复杂的解决方案,它只适用于 Spark >= 3.1.0
(因为 json_object_keys
函数)。能够将结构转换为地图真是太好了
WITH input (struct_col) as (
select named_struct('x', 'valX', 'y', 'valY') union all
select named_struct('x', 'valX1', 'y', 'valY2')
)
select transform_values(
map_from_arrays(
json_object_keys(to_json(struct_col)),
json_object_keys(to_json(struct_col))
),
(k, v) -> get_json_object(to_json(struct_col), '$.' || k))
from input
我找到了一种方法,它需要使用 to_json
and from_json
函数来回序列化和解析 json。诀窍是 from_json
也采用模式参数,我使用 map<string, string>
类型。
此外,此解决方案应该适用于 spark < 3.x
WITH input (struct_col) as (
select named_struct('x', 'valX', 'y', 'valY')
union all
select named_struct('x', 'valX1', 'y', 'valY2')
)
select from_json(to_json(struct_col), 'map<string, string>') as map_col
from input;
怎么样
create_map('struct_col.x', 'struct_col.valX', 'struct_col.y', 'struct_col.valY')
我正在尝试将一个声明列具有特定 struct
类型(例如 struct<x: string, y: string>
)的数据集转换为 map<string, string>
类型。我想在 SQL 中完成,可能不使用 UDF。
更新:
My requirement is also that the transformation is done generically without any prior knowledge of the struct keys (in my problem I am getting data in a complex JSON, and I don't want to keep that complexity in the schema).
示例输入数据:
WITH input (struct_col) as (
select named_struct('x', 'valX', 'y', 'valY') union all
select named_struct('x', 'valX1', 'y', 'valY2')
)
select *
from input
预期输出是 map<string, string>
struct_col:map<string, string> |
---|
{"x":"valX","y":"valY"} |
{"x":"valX1","y":"valY2"} |
更新:
到目前为止,我设法找到了这个非常复杂的解决方案,它只适用于 Spark >= 3.1.0
(因为 json_object_keys
函数)。能够将结构转换为地图真是太好了
WITH input (struct_col) as (
select named_struct('x', 'valX', 'y', 'valY') union all
select named_struct('x', 'valX1', 'y', 'valY2')
)
select transform_values(
map_from_arrays(
json_object_keys(to_json(struct_col)),
json_object_keys(to_json(struct_col))
),
(k, v) -> get_json_object(to_json(struct_col), '$.' || k))
from input
我找到了一种方法,它需要使用 to_json
and from_json
函数来回序列化和解析 json。诀窍是 from_json
也采用模式参数,我使用 map<string, string>
类型。
此外,此解决方案应该适用于 spark < 3.x
WITH input (struct_col) as (
select named_struct('x', 'valX', 'y', 'valY')
union all
select named_struct('x', 'valX1', 'y', 'valY2')
)
select from_json(to_json(struct_col), 'map<string, string>') as map_col
from input;
怎么样
create_map('struct_col.x', 'struct_col.valX', 'struct_col.y', 'struct_col.valY')