嵌套Json提取中间key未知的值
Nested Json extract the value with unknown key in the middle
我在这样的数据框中有一个 Json 列 (colJson)
{
"a": "value1",
"b": "value1",
"c": true,
"details": {
"qgiejfkfk123": { //unknown value
"model1": {
"score": 0.531,
"version": "v1"
},
"model2": {
"score": 0.840,
"version": "v2"
},
"other_details": {
"decision": false,
"version": "v1"
}
}
}
}
这里的'qgiejfkfk123'是动态值,随着每一行的变化而变化。但是我需要提取 model1.score 以及 model2.score.
我试过了
sourceDf.withColumn("model1_score",get_json_object(col("colJson"), "$.details.*.model1.score").cast(DoubleType))
.withColumn("model2_score",get_json_object(col("colJson"), "$.details.*.model2.score").cast(DoubleType))
但是没用。
我设法通过使用 from_json
获得了您的解决方案,将动态值解析为 Map 并从中展开值:
val schema = "STRUCT<`details`: MAP<STRING, STRUCT<`model1`: STRUCT<`score`: DOUBLE, `version`: STRING>, `model2`: STRUCT<`score`: DOUBLE, `version`: STRING>, `other_details`: STRUCT<`decision`: BOOLEAN, `version`: STRING>>>>"
val fromJsonDf = sourceDf.withColumn("colJson", from_json(col("colJson"), lit(schema)))
val explodeDf = fromJsonDf.select($"*", explode(col("colJson.details")))
// +----------------------------------------------------------+------------+--------------------------------------+
// |colJson |key |value |
// +----------------------------------------------------------+------------+--------------------------------------+
// |{{qgiejfkfk123 -> {{0.531, v1}, {0.84, v2}, {false, v1}}}}|qgiejfkfk123|{{0.531, v1}, {0.84, v2}, {false, v1}}|
// +----------------------------------------------------------+------------+--------------------------------------+
val finalDf = explodeDf.select(col("value.model1.score").as("model1_score"), col("value.model2.score").as("model2_score"))
// +------------+------------+
// |model1_score|model2_score|
// +------------+------------+
// | 0.531| 0.84|
// +------------+------------+
我在这样的数据框中有一个 Json 列 (colJson)
{
"a": "value1",
"b": "value1",
"c": true,
"details": {
"qgiejfkfk123": { //unknown value
"model1": {
"score": 0.531,
"version": "v1"
},
"model2": {
"score": 0.840,
"version": "v2"
},
"other_details": {
"decision": false,
"version": "v1"
}
}
}
}
这里的'qgiejfkfk123'是动态值,随着每一行的变化而变化。但是我需要提取 model1.score 以及 model2.score.
我试过了
sourceDf.withColumn("model1_score",get_json_object(col("colJson"), "$.details.*.model1.score").cast(DoubleType))
.withColumn("model2_score",get_json_object(col("colJson"), "$.details.*.model2.score").cast(DoubleType))
但是没用。
我设法通过使用 from_json
获得了您的解决方案,将动态值解析为 Map
val schema = "STRUCT<`details`: MAP<STRING, STRUCT<`model1`: STRUCT<`score`: DOUBLE, `version`: STRING>, `model2`: STRUCT<`score`: DOUBLE, `version`: STRING>, `other_details`: STRUCT<`decision`: BOOLEAN, `version`: STRING>>>>"
val fromJsonDf = sourceDf.withColumn("colJson", from_json(col("colJson"), lit(schema)))
val explodeDf = fromJsonDf.select($"*", explode(col("colJson.details")))
// +----------------------------------------------------------+------------+--------------------------------------+
// |colJson |key |value |
// +----------------------------------------------------------+------------+--------------------------------------+
// |{{qgiejfkfk123 -> {{0.531, v1}, {0.84, v2}, {false, v1}}}}|qgiejfkfk123|{{0.531, v1}, {0.84, v2}, {false, v1}}|
// +----------------------------------------------------------+------------+--------------------------------------+
val finalDf = explodeDf.select(col("value.model1.score").as("model1_score"), col("value.model2.score").as("model2_score"))
// +------------+------------+
// |model1_score|model2_score|
// +------------+------------+
// | 0.531| 0.84|
// +------------+------------+