Spark 使用 Scala 将 json 数据转换为 DataFrame
Spark convert json data to DataFrame using Scala
输入one.txt文件
[{"a":1,"b":2,"c":3}, {"a":11,"b":12,"c":13},{"a":1,"b":2,"c":3}]
预计结束时间:
a b c
1,11,1 2,12,1 3,13,3
能否请您使用 scala 在 Spark dataFrame 中提供解决方案?
val spark = SparkSession.builder().appName("JSON_Sample").master("local[1]") getOrCreate()
val data = """[{"a":1,"b":2,"c":3}, {"a":11,"b":12,"c":13},{"a":1,"b":2,"c":3}]""" //one.txt
val df = spark.read.text("./src/main/scala/resources/text/one.txt").toDF()
这是带有 spark 的 运行 代码的 python 版本。如果你能转换它就没问题,否则告诉我我会做的。
df = spark.read.json(sc.parallelize([{"a":1,"b":2,"c":3},{"a":11,"b":12,"c":13},{"a":1,"b":2,"c":3}]))
df.show()
+---+---+---+
| a| b| c|
+---+---+---+
| 1| 2| 3|
| 11| 12| 13|
| 1| 2| 3|
+---+---+---+
df.agg(*[concat_ws(",",collect_list(col(i))).alias(i) for i in df.columns]).show()
+------+------+------+
| a| b| c|
+------+------+------+
|1,11,1|2,12,2|3,13,3|
+------+------+------+
对于 scala Spark:
import spark.implicits._
val spark = SparkSession.builder().appName("JSON_Sample").master("local[1]") getOrCreate()
val jsonStr = """[{"a":1,"b":2,"c":3}, {"a":11,"b":12,"c":13},{"a":1,"b":2,"c":3}]"""
val df= spark.read.json(spark.createDataset(jsonStr :: Nil))
val exprs = df.columns.map((_ -> "collect_list")).toMap df.agg(exprs).show()
输入one.txt文件
[{"a":1,"b":2,"c":3}, {"a":11,"b":12,"c":13},{"a":1,"b":2,"c":3}]
预计结束时间:
a b c
1,11,1 2,12,1 3,13,3
能否请您使用 scala 在 Spark dataFrame 中提供解决方案?
val spark = SparkSession.builder().appName("JSON_Sample").master("local[1]") getOrCreate()
val data = """[{"a":1,"b":2,"c":3}, {"a":11,"b":12,"c":13},{"a":1,"b":2,"c":3}]""" //one.txt
val df = spark.read.text("./src/main/scala/resources/text/one.txt").toDF()
这是带有 spark 的 运行 代码的 python 版本。如果你能转换它就没问题,否则告诉我我会做的。
df = spark.read.json(sc.parallelize([{"a":1,"b":2,"c":3},{"a":11,"b":12,"c":13},{"a":1,"b":2,"c":3}]))
df.show()
+---+---+---+
| a| b| c|
+---+---+---+
| 1| 2| 3|
| 11| 12| 13|
| 1| 2| 3|
+---+---+---+
df.agg(*[concat_ws(",",collect_list(col(i))).alias(i) for i in df.columns]).show()
+------+------+------+
| a| b| c|
+------+------+------+
|1,11,1|2,12,2|3,13,3|
+------+------+------+
对于 scala Spark:
import spark.implicits._
val spark = SparkSession.builder().appName("JSON_Sample").master("local[1]") getOrCreate()
val jsonStr = """[{"a":1,"b":2,"c":3}, {"a":11,"b":12,"c":13},{"a":1,"b":2,"c":3}]"""
val df= spark.read.json(spark.createDataset(jsonStr :: Nil))
val exprs = df.columns.map((_ -> "collect_list")).toMap df.agg(exprs).show()