通过 Spark 中不同列的值索引地图
Index a map by a the value of a different column in Spark
我有一个具有以下架构的数据框:
|-- A: map (nullable = true)
| |-- key: string
| |-- value: array (valueContainsNull = true)
| | |-- element: struct (containsNull = true)
| | | |-- uid: string (nullable = true)
| | | |-- price: double (nullable = true)
| | | |-- type: string (nullable = true)
|-- keyindex: string (nullable = true)
例如,如果我有以下数据:
{"A":{
"innerkey_1":[{"uid":"1","price":0.01,"recordtype":"STAT"},
{"uid":"6","price":4.3,"recordtype":"DYN"}],
"innerkey_2":[{"uid":"2","price":2.01,"recordtype":"DYN"},
{"uid":"4","price":6.1,"recordtype":"DYN"}]},
"innerkey_2"}
我使用以下模式将数据读入数据帧:
val schema = (new StructType().add("mainkey", MapType(StringType, new ArrayType(new StructType().add("uid",StringType).add("price",DoubleType).add("recordtype",StringType), true))).add("keyindex",StringType))
我想弄清楚是否可以使用键索引来获取地图中的 select 值。由于示例中的 keyindex 是 "innerkey_2",我希望输出是
[{"uid":"2","price":2.01,"recordtype":"DYN"},
{"uid":"4","price":6.1,"recordtype":"DYN"}]
感谢您的帮助!
getItem
应该可以解决问题:
scala> val df = Seq(("innerkey2", Map("innerkey2" -> Seq(("1", 0.01, "STAT"))))).toDF("keyindex", "A")
df: org.apache.spark.sql.DataFrame = [keyindex: string, A: map<string,array<struct<_1:string,_2:double,_3:string>>>]
scala> df.select($"A"($"keyindex")).show
+---------------+
| A[keyindex]|
+---------------+
|[[1,0.01,STAT]]|
+---------------+
我有一个具有以下架构的数据框:
|-- A: map (nullable = true)
| |-- key: string
| |-- value: array (valueContainsNull = true)
| | |-- element: struct (containsNull = true)
| | | |-- uid: string (nullable = true)
| | | |-- price: double (nullable = true)
| | | |-- type: string (nullable = true)
|-- keyindex: string (nullable = true)
例如,如果我有以下数据:
{"A":{
"innerkey_1":[{"uid":"1","price":0.01,"recordtype":"STAT"},
{"uid":"6","price":4.3,"recordtype":"DYN"}],
"innerkey_2":[{"uid":"2","price":2.01,"recordtype":"DYN"},
{"uid":"4","price":6.1,"recordtype":"DYN"}]},
"innerkey_2"}
我使用以下模式将数据读入数据帧:
val schema = (new StructType().add("mainkey", MapType(StringType, new ArrayType(new StructType().add("uid",StringType).add("price",DoubleType).add("recordtype",StringType), true))).add("keyindex",StringType))
我想弄清楚是否可以使用键索引来获取地图中的 select 值。由于示例中的 keyindex 是 "innerkey_2",我希望输出是
[{"uid":"2","price":2.01,"recordtype":"DYN"},
{"uid":"4","price":6.1,"recordtype":"DYN"}]
感谢您的帮助!
getItem
应该可以解决问题:
scala> val df = Seq(("innerkey2", Map("innerkey2" -> Seq(("1", 0.01, "STAT"))))).toDF("keyindex", "A")
df: org.apache.spark.sql.DataFrame = [keyindex: string, A: map<string,array<struct<_1:string,_2:double,_3:string>>>]
scala> df.select($"A"($"keyindex")).show
+---------------+
| A[keyindex]|
+---------------+
|[[1,0.01,STAT]]|
+---------------+