将一列中的值附加到同一数据框中的另一 JSON 列
Append values from one column to another JSON column in the same dataframe
我在数据框中有一些数据,如下所示:
+-----------+--------+-----------+--------------------------------+
| Noun| Pronoun| Adjective|Metadata |
+-----------+--------+-----------+--------------------------------+
| Homer| Simpson|Engineer |{"Age": "50", "Country": "USA"} |
| Elon | Musk |King |{"Age": "45", "Country": "RSA"} |
| Bart | Lee |Cricketer |{"Age": "35", "Country": "AUS"} |
| Lisa | Jobs |Daughter |{"Age": "35", "Country": "IND"} |
| Joe | Root |Player |{"Age": "31", "Country": "ENG"} |
+-----------+--------+-----------+--------------------------------+
我想将另一列(比如 Adjective
)的值附加到 Metadata
列中。这样最终的数据框将如下所示:
+-----------+--------+-----------+------------------------------------------------------------+
| Noun| Pronoun| Adjective|Metadata |
+-----------+--------+-----------+------------------------------------------------------------+
| Homer| Simpson|Engineer |{"Age": "50", "Country": "USA", "Adjective": "Engineer"} |
| Elon | Musk |King |{"Age": "45", "Country": "RSA", "Adjective": "King"} |
| Bart | Lee |Cricketer |{"Age": "35", "Country": "AUS", "Adjective": "Cricketer"} |
| Lisa | Jobs |Daughter |{"Age": "35", "Country": "IND", "Adjective": "Daughter"} |
| Joe | Root |Player |{"Age": "31", "Country": "ENG", "Adjective": "Player"} |
+-----------+--------+-----------+------------------------------------------------------------+
请建议如何实施。
假设您的列 Metadata
包含 JSON 个字符串,您可以先使用 from_json
函数将其转换为 MapType
,然后使用 [=14 添加您想要的列=] 最后使用 to_json
:
再次转换为 JSON 字符串
val df2 = df.withColumn(
"Metadata",
from_json(col("Metadata"), lit("map<string,string>"))
).withColumn(
"Metadata",
to_json(map_concat(col("Metadata"), map(lit("Adjective"), col("Adjective"))))
)
df2.show(false)
//+-----+-------+---------+----------------------------------------------------+
//|Noun |Pronoun|Adjective|Metadata |
//+-----+-------+---------+----------------------------------------------------+
//|Homer|Simpson|Engineer |{"Age":"50","Country":"USA","Adjective":"Engineer"} |
//|Elon |Musk |King |{"Age":"45","Country":"RSA","Adjective":"King"} |
//|Bart |Lee |Cricketer|{"Age":"35","Country":"AUS","Adjective":"Cricketer"}|
//|Lisa |Jobs |Daughter |{"Age":"35","Country":"IND","Adjective":"Daughter"} |
//|Joe |Root |Player |{"Age":"31","Country":"ENG","Adjective":"Player"} |
//+-----+-------+---------+----------------------------------------------------+
这也可以使用转换为 StructType 而不是 MapType 来完成,但在这种情况下 map 更通用。
我在数据框中有一些数据,如下所示:
+-----------+--------+-----------+--------------------------------+
| Noun| Pronoun| Adjective|Metadata |
+-----------+--------+-----------+--------------------------------+
| Homer| Simpson|Engineer |{"Age": "50", "Country": "USA"} |
| Elon | Musk |King |{"Age": "45", "Country": "RSA"} |
| Bart | Lee |Cricketer |{"Age": "35", "Country": "AUS"} |
| Lisa | Jobs |Daughter |{"Age": "35", "Country": "IND"} |
| Joe | Root |Player |{"Age": "31", "Country": "ENG"} |
+-----------+--------+-----------+--------------------------------+
我想将另一列(比如 Adjective
)的值附加到 Metadata
列中。这样最终的数据框将如下所示:
+-----------+--------+-----------+------------------------------------------------------------+
| Noun| Pronoun| Adjective|Metadata |
+-----------+--------+-----------+------------------------------------------------------------+
| Homer| Simpson|Engineer |{"Age": "50", "Country": "USA", "Adjective": "Engineer"} |
| Elon | Musk |King |{"Age": "45", "Country": "RSA", "Adjective": "King"} |
| Bart | Lee |Cricketer |{"Age": "35", "Country": "AUS", "Adjective": "Cricketer"} |
| Lisa | Jobs |Daughter |{"Age": "35", "Country": "IND", "Adjective": "Daughter"} |
| Joe | Root |Player |{"Age": "31", "Country": "ENG", "Adjective": "Player"} |
+-----------+--------+-----------+------------------------------------------------------------+
请建议如何实施。
假设您的列 Metadata
包含 JSON 个字符串,您可以先使用 from_json
函数将其转换为 MapType
,然后使用 [=14 添加您想要的列=] 最后使用 to_json
:
val df2 = df.withColumn(
"Metadata",
from_json(col("Metadata"), lit("map<string,string>"))
).withColumn(
"Metadata",
to_json(map_concat(col("Metadata"), map(lit("Adjective"), col("Adjective"))))
)
df2.show(false)
//+-----+-------+---------+----------------------------------------------------+
//|Noun |Pronoun|Adjective|Metadata |
//+-----+-------+---------+----------------------------------------------------+
//|Homer|Simpson|Engineer |{"Age":"50","Country":"USA","Adjective":"Engineer"} |
//|Elon |Musk |King |{"Age":"45","Country":"RSA","Adjective":"King"} |
//|Bart |Lee |Cricketer|{"Age":"35","Country":"AUS","Adjective":"Cricketer"}|
//|Lisa |Jobs |Daughter |{"Age":"35","Country":"IND","Adjective":"Daughter"} |
//|Joe |Root |Player |{"Age":"31","Country":"ENG","Adjective":"Player"} |
//+-----+-------+---------+----------------------------------------------------+
这也可以使用转换为 StructType 而不是 MapType 来完成,但在这种情况下 map 更通用。