如何在 Spark 中将 Mahout VectorWritable 转换为 Vector
How to convert Mahout VectorWritable to Vector in Spark
我有一个 VectorWritable
(org.apache.mahout.math.VectorWritable
),它来自 Mahout 生成的序列文件,我想将其转换为 Vector (org.apache.spark.mllib.linalg.Vectors
) 类型,即 Spark。我怎样才能在 Scala 中做到这一点?
假设我们有 RDD[(Text, VectorWritable)]
来自您的 previous question。
import scala.collection.JavaConverters.iterableAsScalaIterableConverter
def mahoutToScala(v: org.apache.mahout.math.VectorWritable) = {
val scalaArray = v.get.all.asScala.map(_.get).toArray
org.apache.spark.mllib.linalg.Vectors.dense(scalaArray)
}
rdd.map{ case (k, v) => (k.toString, mahoutToScala(v))}
我有一个 VectorWritable
(org.apache.mahout.math.VectorWritable
),它来自 Mahout 生成的序列文件,我想将其转换为 Vector (org.apache.spark.mllib.linalg.Vectors
) 类型,即 Spark。我怎样才能在 Scala 中做到这一点?
假设我们有 RDD[(Text, VectorWritable)]
来自您的 previous question。
import scala.collection.JavaConverters.iterableAsScalaIterableConverter
def mahoutToScala(v: org.apache.mahout.math.VectorWritable) = {
val scalaArray = v.get.all.asScala.map(_.get).toArray
org.apache.spark.mllib.linalg.Vectors.dense(scalaArray)
}
rdd.map{ case (k, v) => (k.toString, mahoutToScala(v))}