多行数据框爆炸列表列

Question

我有下面的数据框，每列包含一个相同大小的值列表

--------+--------------------+--------------------+--------------------+--------------
       |Country_1|               |Country_2|          |Country_3|          |Country_4|
+--------------------+--------------------+--------------------+--------------------+
|[1, 2, 3, 4, 5, 6 ] | [x1, x2, x3, x4, x5, x6 ]|[y1, y2, y3, y4, y5, y6 ] |[v1, v2, v3, v4, v5, v6 ]

我需要将每个元素列表转换成一行，以便进一步详细说明，根据我所看到的 post 我应该使用 explode 函数来结束不知何故如下：

Country_1   Country_2   Country_3   Country_4
   1            x1         y1          v1
   2            x2         y2          v2
   3            x3         y3          v3
   4            x4         y4          v4
   5            x5         y5          v5
   6            x6         y6          v6

我已经尝试了下面的代码，但到目前为止还没有成功。

data.withColumn("Country_1Country_2", F.arrays_zip("Country_1","Country_2")).select(*, F.explode("Country_1Country_2").alias("tCountry_1Country_2")).select(*, "tCountry_1Country_2.Country_1", col("Country_1Country_2.Country_2")).show()

Answer 1

# This is not part of the solution, just creation of the data sample
# df = spark.sql("select stack(1, array(1, 2, 3, 4, 5, 6) ,array('x1', 'x2', 'x3', 'x4', 'x5', 'x6') ,array('y1', 'y2', 'y3', 'y4', 'y5', 'y6') ,array('v1', 'v2', 'v3', 'v4', 'v5', 'v6')) as (Country_1, Country_2,Country_3,Country_4)")

df.selectExpr('inline(arrays_zip(*))').show()

+---------+---------+---------+---------+
|Country_1|Country_2|Country_3|Country_4|
+---------+---------+---------+---------+
|        1|       x1|       y1|       v1|
|        2|       x2|       y2|       v2|
|        3|       x3|       y3|       v3|
|        4|       x4|       y4|       v4|
|        5|       x5|       y5|       v5|
|        6|       x6|       y6|       v6|
+---------+---------+---------+---------+

多行数据框爆炸列表列

Dataframe explode list columns in multiple rows

dataframe

rdd

apache-spark-sql

pyspark