如何在 Spark 中将 Mahout VectorWritable 转换为 Vector

How to convert Mahout VectorWritable to Vector in Spark

我有一个 VectorWritable (org.apache.mahout.math.VectorWritable),它来自 Mahout 生成的序列文件,我想将其转换为 Vector (org.apache.spark.mllib.linalg.Vectors) 类型,即 Spark。我怎样才能在 Scala 中做到这一点?

假设我们有 RDD[(Text, VectorWritable)] 来自您的 previous question

import scala.collection.JavaConverters.iterableAsScalaIterableConverter

def mahoutToScala(v: org.apache.mahout.math.VectorWritable) =  {
    val scalaArray = v.get.all.asScala.map(_.get).toArray
    org.apache.spark.mllib.linalg.Vectors.dense(scalaArray)
}

rdd.map{ case (k, v) => (k.toString, mahoutToScala(v))}