如何读取 JSON 数组结构中的字符串值?
How to read a string value in JSON array struct?
这是我的代码:
df_05_body = spark.sql("""
select
gtin
, principalBody.constituents
from
v_df_04""")
df_05_body.createOrReplaceTempView("v_df_05_body")
df_05_body.printSchema()
这是架构:
root
|-- gtin: array (nullable = true)
| |-- element: string (containsNull = true)
|-- constituents: array (nullable = true)
| |-- element: array (containsNull = true)
| | |-- element: struct (containsNull = true)
| | | |-- constituentCategory: struct (nullable = true)
| | | | |-- value: string (nullable = true)
| | | | |-- valueRange: string (nullable = true)
如何更改 SQL 中的 principalBody.constituents
行以读取字段 constituentCategory.value
和 constituentCategory.valueRange
?
列constituents
是结构数组的数组。如果您的目的是获得平面结构,那么您需要展平嵌套数组,然后展开:
df_05_body = spark.sql("""
WITH
v_df_04_exploded AS (
SELECT
gtin,
explode(flatten(principalBody.constituents)) AS constituent
FROM
v_df_04 )
SELECT
gtin,
constituent.constituentCategory.value,
constituent.constituentCategory.valueRange
FROM
v_df_04_exploded
""")
或者像这样在 flatten
之后简单地使用 inline
:
df_05_body = spark.sql("""
SELECT
gtin,
inline(flatten(principalBody.constituents))
FROM
v_df_04_exploded
""")
这是我的代码:
df_05_body = spark.sql("""
select
gtin
, principalBody.constituents
from
v_df_04""")
df_05_body.createOrReplaceTempView("v_df_05_body")
df_05_body.printSchema()
这是架构:
root
|-- gtin: array (nullable = true)
| |-- element: string (containsNull = true)
|-- constituents: array (nullable = true)
| |-- element: array (containsNull = true)
| | |-- element: struct (containsNull = true)
| | | |-- constituentCategory: struct (nullable = true)
| | | | |-- value: string (nullable = true)
| | | | |-- valueRange: string (nullable = true)
如何更改 SQL 中的 principalBody.constituents
行以读取字段 constituentCategory.value
和 constituentCategory.valueRange
?
列constituents
是结构数组的数组。如果您的目的是获得平面结构,那么您需要展平嵌套数组,然后展开:
df_05_body = spark.sql("""
WITH
v_df_04_exploded AS (
SELECT
gtin,
explode(flatten(principalBody.constituents)) AS constituent
FROM
v_df_04 )
SELECT
gtin,
constituent.constituentCategory.value,
constituent.constituentCategory.valueRange
FROM
v_df_04_exploded
""")
或者像这样在 flatten
之后简单地使用 inline
:
df_05_body = spark.sql("""
SELECT
gtin,
inline(flatten(principalBody.constituents))
FROM
v_df_04_exploded
""")