如何将 linalg.Vector 格式转换为 regression.Labeledpoint 格式?

How to convert from linalg.Vector to regression.Labeledpoint format?

所以我试图在 spark-shell 中实现一个简单的机器学习代码,当我尝试提供一个 csv 文件时,它需要一个 libsvm 格式,所以我使用了 phraug library to convert my dataset into the required format. While that worked, I also needed to normalize my data, so I used Standard Scaler 来转换数据。这也很好,下一步是训练机器,为此我使用了 SVMWithSGD 模型。但是当我尝试训练时,我不断收到错误

    error: type mismatch;
    found: org.apache.spark.rdd.RDD[(Double,org.apache.spark.mllib.linalg.Vector)]
    required: org.apache.spark.rdd.RDD[org.apache.spark.mllib.regression.LabeledPoint]

我明白是兼容性问题,Vector.dense功能可以用,但是我不想再拆分了,我不明白的是,难道没有直接方法,以便我可以将它用于火车方法? P.S。为了帮助您了解当前的数据,如下所示

    (0.0,[0.03376345160534202,-0.6339809012492886,-6.719697792783955,-6.719697792783965,-6.30231507117855,-8.72828614492483,0.03884804438718658,0.3041969425433718])
    (0.0,[0.2535328275090413,-0.8780294632355746,-6.719697792783955,-6.719697792783965,-6.30231507117855,-8.72828614492483,0.26407233411369857,0.3041969425433718])

假设您的 RDD[Double, Vector] 被称为 vectorRDD:

val labeledPointRDD = vectorRDD map { 
  case (label, vector) => LabeledPoint(label, vector) 
}