从 Spark 中的数组中提取值
Extract value from array in Spark
我试图从 SparkSQL 中的数组中提取一个值,但出现以下错误:
示例列
customer_details
{"original_customer_id":"ch_382820","first_customer_id":"ch_343948"}
我正在使用此代码:
get_json_object(customer_details, '$.original_customer_id') as customer_id
但我收到以下错误:
error: invalid string interpolation $., expected: $$, $identifier or ${expression}
spark.sql(s"""
error: unclosed character literal (or use " not ' for string literal)
get_json_object(customer_details, '$.original_customer_id') as customer_id,
对我来说,以下方法有效:
val df = Seq("{'original_customer_id':'ch_382820','first_customer_id':'ch_343948'}").toDF("customer_details")
df.show(truncate=false)
// +--------------------------------------------------------------------+
// |customer_details |
// +--------------------------------------------------------------------+
// |{'original_customer_id':'ch_382820','first_customer_id':'ch_343948'}|
// +--------------------------------------------------------------------+
df.selectExpr("get_json_object(customer_details, '$.original_customer_id') as customer_id").show()
// +-----------+
// |customer_id|
// +-----------+
// | ch_382820|
// +-----------+
根据要求,这是 Spark SQL 版本:
select get_json_object(customer_details, '$.original_customer_id') as customer_id
from df
df.createOrReplaceTempView("df")
spark.sql(
"""
select get_json_object(customer_details, '$.original_customer_id') as customer_id
from df
"""
).show()
// +-----------+
// |customer_id|
// +-----------+
// | ch_382820|
// +-----------+
我试图从 SparkSQL 中的数组中提取一个值,但出现以下错误:
示例列
customer_details
{"original_customer_id":"ch_382820","first_customer_id":"ch_343948"}
我正在使用此代码:
get_json_object(customer_details, '$.original_customer_id') as customer_id
但我收到以下错误:
error: invalid string interpolation $., expected: $$, $identifier or ${expression}
spark.sql(s"""
error: unclosed character literal (or use " not ' for string literal)
get_json_object(customer_details, '$.original_customer_id') as customer_id,
对我来说,以下方法有效:
val df = Seq("{'original_customer_id':'ch_382820','first_customer_id':'ch_343948'}").toDF("customer_details")
df.show(truncate=false)
// +--------------------------------------------------------------------+
// |customer_details |
// +--------------------------------------------------------------------+
// |{'original_customer_id':'ch_382820','first_customer_id':'ch_343948'}|
// +--------------------------------------------------------------------+
df.selectExpr("get_json_object(customer_details, '$.original_customer_id') as customer_id").show()
// +-----------+
// |customer_id|
// +-----------+
// | ch_382820|
// +-----------+
根据要求,这是 Spark SQL 版本:
select get_json_object(customer_details, '$.original_customer_id') as customer_id
from df
df.createOrReplaceTempView("df")
spark.sql(
"""
select get_json_object(customer_details, '$.original_customer_id') as customer_id
from df
"""
).show()
// +-----------+
// |customer_id|
// +-----------+
// | ch_382820|
// +-----------+