Spark DF 列到字符串 JSON

Spark DF column to string JSON

我有一个这样的DF:

+------------+-------------------------------------------------------------+
|pk_attr_name|pk_struct                                                    |
+------------+-------------------------------------------------------------+
|CLNT_GRP_CD |{"pk_seq":1,"pk_attr_id":20209,"pk_attr_name":"CLNT_GRP_CD"} |
|IDI_CONTRACT|{"pk_seq":2,"pk_attr_id":45483,"pk_attr_name":"IDI_CONTRACT"}|
+------------+-------------------------------------------------------------+

我想从 pk_struct 列定义一个 JSON 字符串。期望的输出:

pk_struct_str = '[{"pk_seq":1,"pk_attr_id":20209,"pk_attr_name":"CLNT_GRP_CD"},{"pk_seq":2,"pk_attr_id":45483,"pk_attr_name":"IDI_CONTRACT"}]'

我试过了:

pk_df.select(F.to_json(F.struct("pk_struct")).alias("json")).show(truncate=False)

但没有给我想要的结果

pk_df.printSchema()
root
 |-- pk_attr_name: string (nullable = true)
 |-- pk_struct: string (nullable = true)

您可以使用 collect_list 或 collect_set function.But 实现此结果,它可以与聚合函数一起使用。因此创建了虚拟列并按该列值分组并在聚合中使用 collect_list 函数

df.show(2,False)
df1 = df.withColumn("dummy",lit("XXX"))
df2 = df1.groupBy("dummy").agg(collect_list(df1.pk_struct))
df2.show(2,False)

    
+------------+-------------------------------------------------------------+
|pk_attr_name|pk_struct                                                    |
+------------+-------------------------------------------------------------+
|CLNT_GRP_CD |{"pk_seq":1,"pk_attr_id":20209,"pk_attr_name":"CLNT_GRP_CD"} |
|IDI_CONTRACT|{"pk_seq":2,"pk_attr_id":45483,"pk_attr_name":"IDI_CONTRACT"}|
+------------+-------------------------------------------------------------+

+-----+-----------------------------------------------------------------------------------------------------------------------------+
|dummy|collect_list(pk_struct)                                                                                                      |
+-----+-----------------------------------------------------------------------------------------------------------------------------+
|XXX  |[{"pk_seq":1,"pk_attr_id":20209,"pk_attr_name":"CLNT_GRP_CD"}, {"pk_seq":2,"pk_attr_id":45483,"pk_attr_name":"IDI_CONTRACT"}]|
+-----+-----------------------------------------------------------------------------------------------------------------------------+