我想将镶木地板文件数据放入下面的字符串 format.Can 有人帮帮我吗?
I want the parquet file data to be into the below string format.Can some one help me?
我正在尝试获取以下格式的数据。有人可以帮我获得 spark 和 scala 的 UDF 吗?我是新来的。有人可以帮忙吗?
我期待的输出是字符串输出
Java|XX||Scala|XA
如果它在数组中有一个继续值,它应该继续
请帮助我,非常重要的任务
root
|-- name: string (nullable = true)
|-- booksIntersted: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- name: string (nullable = true)
| | |-- author: string (nullable = true)
| |
+----------+-----------------------------------+
|name |booksIntersted |
+----------+-----------------------------------+
|James |[[Java, XX], [Scala, XA]]|
|Michael |[[Java, XY], [Scala, XB]]|
|Robert |[[Java, XZ], [Scala, XC]]|
|Washington|null |
+----------+-----------------------------------+
```
检查下面的代码。
val finalDF = df
.withColumn(
"booksIntersted",
when(
size($"booksIntersted") > 0,
expr("concat_ws('||',transform(booksIntersted,x -> concat(x.name,'|',x.author)))")
)
)
finalDF.printSchema
root
|-- name: string (nullable = true)
|-- booksIntersted: string (nullable = true)
finalDF.show(假)
+----------+-----------------+
|name |booksIntersted |
+----------+-----------------+
|James |Java|XX||Scala|XA|
|Michael |Java|XY||Scala|XB|
|Robert |Java|XZ||Scala|XC|
|Washington|null |
+----------+-----------------+
正在以 csv
格式写入数据
finalDF
.repartition(1)
.write
.format("csv")
.option("header","true")
.save("/tmp/csv/data")
cd /tmp/csv/data
> cat part-00000-c8527721-5b25-4689-bfe4-028ac2873def-c000.csv
name,booksIntersted
James,Java|XX||Scala|XA
Michael,Java|XY||Scala|XB
Robert,Java|XZ||Scala|XC
Washington,""
使用udf
scala> val combine = udf((row: Seq[Row]) => {
row
.map(r => r.getAs[String]("name") + "|" + r.getAs[String]("author"))
.reduce(_+ "||" + _)
})
scala> df
.withColumn(
"booksInterstedNew",
when(
size($"booksIntersted") > 0,
combine($"booksIntersted")
)
)
.show(false)
+----------+-------------------------+-----------------+
|name |booksIntersted |booksInterstedNew|
+----------+-------------------------+-----------------+
|James |[[Java, XX], [Scala, XA]]|Java|XX||Scala|XA|
|Michael |[[Java, XY], [Scala, XB]]|Java|XY||Scala|XB|
|Robert |[[Java, XZ], [Scala, XC]]|Java|XZ||Scala|XC|
|Washington|[] |null |
+----------+-------------------------+-----------------+
我正在尝试获取以下格式的数据。有人可以帮我获得 spark 和 scala 的 UDF 吗?我是新来的。有人可以帮忙吗? 我期待的输出是字符串输出
Java|XX||Scala|XA
如果它在数组中有一个继续值,它应该继续 请帮助我,非常重要的任务
root
|-- name: string (nullable = true)
|-- booksIntersted: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- name: string (nullable = true)
| | |-- author: string (nullable = true)
| |
+----------+-----------------------------------+
|name |booksIntersted |
+----------+-----------------------------------+
|James |[[Java, XX], [Scala, XA]]|
|Michael |[[Java, XY], [Scala, XB]]|
|Robert |[[Java, XZ], [Scala, XC]]|
|Washington|null |
+----------+-----------------------------------+
```
检查下面的代码。
val finalDF = df
.withColumn(
"booksIntersted",
when(
size($"booksIntersted") > 0,
expr("concat_ws('||',transform(booksIntersted,x -> concat(x.name,'|',x.author)))")
)
)
finalDF.printSchema
root
|-- name: string (nullable = true)
|-- booksIntersted: string (nullable = true)
finalDF.show(假)
+----------+-----------------+
|name |booksIntersted |
+----------+-----------------+
|James |Java|XX||Scala|XA|
|Michael |Java|XY||Scala|XB|
|Robert |Java|XZ||Scala|XC|
|Washington|null |
+----------+-----------------+
正在以 csv
格式写入数据
finalDF
.repartition(1)
.write
.format("csv")
.option("header","true")
.save("/tmp/csv/data")
cd /tmp/csv/data
> cat part-00000-c8527721-5b25-4689-bfe4-028ac2873def-c000.csv
name,booksIntersted
James,Java|XX||Scala|XA
Michael,Java|XY||Scala|XB
Robert,Java|XZ||Scala|XC
Washington,""
使用udf
scala> val combine = udf((row: Seq[Row]) => {
row
.map(r => r.getAs[String]("name") + "|" + r.getAs[String]("author"))
.reduce(_+ "||" + _)
})
scala> df
.withColumn(
"booksInterstedNew",
when(
size($"booksIntersted") > 0,
combine($"booksIntersted")
)
)
.show(false)
+----------+-------------------------+-----------------+
|name |booksIntersted |booksInterstedNew|
+----------+-------------------------+-----------------+
|James |[[Java, XX], [Scala, XA]]|Java|XX||Scala|XA|
|Michael |[[Java, XY], [Scala, XB]]|Java|XY||Scala|XB|
|Robert |[[Java, XZ], [Scala, XC]]|Java|XZ||Scala|XC|
|Washington|[] |null |
+----------+-------------------------+-----------------+