如何将整数数组更改为 Spark (scala) 中的单个列？

Question

我遵循了一种热编码的解决方案。现在我想要更改数组中的最后一个变量（这是一个整数数组），以便为每个热编码变量获取单独的列。

我当前的 RDD 是：

scala> encode_cars
res2: org.apache.spark.rdd.RDD[(Double, Double, Double, Double, Array[Int])] = MapPartitionsRDD[17] at map at <console>:27

理想情况下我想要这样的东西：

res2: org.apache.spark.rdd.RDD[(Double, Double, Double, Double, Int, Int, Int, Int, Int, Int, Int)] = MapPartitionsRDD[17] at map at <console>:27

我知道这可以使用 map / flatmap 来完成，但我不知道该怎么做。

Answer 1

我找到了一个简单的解决方案，只需索引数组并使用 map 函数：

encode_cars.map(x => (x._1, x._2, x._3, x._4, x._5(1), x._5(2), x._5(3))

How to change an array of integers to individual columns in Spark (scala)?