将数组分解为列 Spark
Explode array into columns Spark
嗨 1,我有一个 json 喜欢 beow:
{meta:{"clusters":[{"1":"Aged 35 to 49"},{"2":"Male"},{"5":"Aged 15 to 17"}]}}
我想获取以下数据框:
+---------------+----+---------------+
| 1| 2| 5 |
+---------------+----+---------------+
| Aged 35 to 49|Male| Aged 15 to 17|
+---------------+----+---------------+
我怎样才能在 pyspark 中做到这一点?
谢谢
您可以使用get_json_object()
函数来解析json列:
示例:
df=spark.createDataFrame([Row(jsn='{"meta":{"clusters":[{"1":"Aged 35 to 49"},{"2":"Male"},{"5":"Aged 15 to 17"}]}}')])
df.selectExpr("get_json_object(jsn,'$.meta.clusters[0].1') as `1`",
"get_json_object(jsn,'$.meta.clusters[*].2') as `2`",
"get_json_object(jsn,'$.meta.clusters[*].5') as `5`").show(10,False)
"Output":
+-------------+------+---------------+
|1 |2 |5 |
+-------------+------+---------------+
|Aged 35 to 49|"Male"|"Aged 15 to 17"|
+-------------+------+---------------+
嗨 1,我有一个 json 喜欢 beow:
{meta:{"clusters":[{"1":"Aged 35 to 49"},{"2":"Male"},{"5":"Aged 15 to 17"}]}}
我想获取以下数据框:
+---------------+----+---------------+
| 1| 2| 5 |
+---------------+----+---------------+
| Aged 35 to 49|Male| Aged 15 to 17|
+---------------+----+---------------+
我怎样才能在 pyspark 中做到这一点?
谢谢
您可以使用get_json_object()
函数来解析json列:
示例:
df=spark.createDataFrame([Row(jsn='{"meta":{"clusters":[{"1":"Aged 35 to 49"},{"2":"Male"},{"5":"Aged 15 to 17"}]}}')])
df.selectExpr("get_json_object(jsn,'$.meta.clusters[0].1') as `1`",
"get_json_object(jsn,'$.meta.clusters[*].2') as `2`",
"get_json_object(jsn,'$.meta.clusters[*].5') as `5`").show(10,False)
"Output":
+-------------+------+---------------+
|1 |2 |5 |
+-------------+------+---------------+
|Aged 35 to 49|"Male"|"Aged 15 to 17"|
+-------------+------+---------------+