如何将 Scala 中的列表列扩展为多行
How to expand a list column in scala to multiple rows
我想转以下列表:
val articledDF = spark.createDF(
List(
("article 1", Array("topic 1", "topic 2")),
("article 2", Array("topic 1", "topic 3")),
("article 3", Array("topic 2"))
), List(
("article", StringType, true),
("topics", ArrayType(StringType, true), true)
)
)
这导致:
+---------+---------------------+
| name |topics |
+---------+---------------------+
|article 1| [topic 1, topic 2]|
|article 2| [topic 1, topic 3]|
|article 3| [topic 2]|
+---------+---------------------+
并按以下方式展开栏目主题:
+---------+-----------+
| name |topic |
+---------+-----------+
|article 1| topic 1 |
|article 1| topic 2 |
|article 2| topic 1 |
|article 2| topic 3 |
|article 3| topic 2 |
+---------+-----------+
很乐意学习如何做到这一点。
使用explode
:
import org.apache.spark.sql.functions._
import spark.implicits._
articledDF.select($"article", explode($"topics") as "topic")
我想转以下列表:
val articledDF = spark.createDF(
List(
("article 1", Array("topic 1", "topic 2")),
("article 2", Array("topic 1", "topic 3")),
("article 3", Array("topic 2"))
), List(
("article", StringType, true),
("topics", ArrayType(StringType, true), true)
)
)
这导致:
+---------+---------------------+
| name |topics |
+---------+---------------------+
|article 1| [topic 1, topic 2]|
|article 2| [topic 1, topic 3]|
|article 3| [topic 2]|
+---------+---------------------+
并按以下方式展开栏目主题:
+---------+-----------+
| name |topic |
+---------+-----------+
|article 1| topic 1 |
|article 1| topic 2 |
|article 2| topic 1 |
|article 2| topic 3 |
|article 3| topic 2 |
+---------+-----------+
很乐意学习如何做到这一点。
使用explode
:
import org.apache.spark.sql.functions._
import spark.implicits._
articledDF.select($"article", explode($"topics") as "topic")