Spark 使用 Scala 将 json 数据转换为 DataFrame

Spark convert json data to DataFrame using Scala

输入one.txt文件

[{"a":1,"b":2,"c":3}, {"a":11,"b":12,"c":13},{"a":1,"b":2,"c":3}]

预计结束时间:

a       b        c
1,11,1  2,12,1   3,13,3

能否请您使用 scala 在 Spark dataFrame 中提供解决方案?

val spark = SparkSession.builder().appName("JSON_Sample").master("local[1]") getOrCreate()
val data = """[{"a":1,"b":2,"c":3}, {"a":11,"b":12,"c":13},{"a":1,"b":2,"c":3}]"""    //one.txt
val df = spark.read.text("./src/main/scala/resources/text/one.txt").toDF()

这是带有 spark 的 运行 代码的 python 版本。如果你能转换它就没问题,否则告诉我我会做的。

df = spark.read.json(sc.parallelize([{"a":1,"b":2,"c":3},{"a":11,"b":12,"c":13},{"a":1,"b":2,"c":3}]))
df.show()
+---+---+---+
|  a|  b|  c|
+---+---+---+
|  1|  2|  3|
| 11| 12| 13|
|  1|  2|  3|
+---+---+---+

df.agg(*[concat_ws(",",collect_list(col(i))).alias(i) for i in df.columns]).show()
+------+------+------+
|     a|     b|     c|
+------+------+------+
|1,11,1|2,12,2|3,13,3|
+------+------+------+

对于 scala Spark:

import spark.implicits._ 
val spark = SparkSession.builder().appName("JSON_Sample").master("local[1]") getOrCreate() 
val jsonStr = """[{"a":1,"b":2,"c":3}, {"a":11,"b":12,"c":13},{"a":1,"b":2,"c":3}]""" 

val df= spark.read.json(spark.createDataset(jsonStr :: Nil)) 

val exprs = df.columns.map((_ -> "collect_list")).toMap df.agg(exprs).show()