如何将 RDD[org.apache.spark.sql.Row] 转换为 RDD[org.apache.spark.mllib.linalg.Vector]
How to convert RDD[org.apache.spark.sql.Row] to RDD[org.apache.spark.mllib.linalg.Vector]
我正在尝试将 RDD[Row]
转换为 RDD[Vector]
但它抛出异常说明
java.lang.ClassCastException: org.apache.spark.ml.linalg.DenseVector cannot be cast to org.apache.spark.mllib.linalg.Vector
我的密码是
val spark = SparkSession.builder().master("local").getOrCreate()
val df = spark.range(0,10).withColumn("uniform" , rand(10L)).withColumn("normal1" , randn(10L)).withColumn("normal2" , randn(11L))
val assembler = new VectorAssembler().setInputCols(Array("uniform" ,"normal1","normal2")).setOutputCol("features")
val dfVec = assembler.transform(df)
val dfOutlier = dfVec.select("id" , "features").union( spark.createDataFrame(Seq( (10 , org.apache.spark.mllib.linalg.Vectors.dense(3,3,3)) )) )
dfOutlier.show(false)
val scaler = new StandardScaler().setInputCol("features").setOutputCol("Scaled").setWithStd(true).setWithMean(true)
val model = scaler.fit(dfOutlier).transform(dfOutlier)
model.show(false)
val dfVecRdd = model.select("Scaled").rdd.map(_(0).asInstanceOf[org.apache.spark.mllib.linalg.Vector] )
当我对 dfVecRdd
执行操作时,出现异常。我该如何解决这个问题?
尝试在您的代码中删除此导入
org.apache.spark.mllib.linalg.Vector
并导入这个
import org.apache.spark.ml.linalg.Vectors
我正在尝试将 RDD[Row]
转换为 RDD[Vector]
但它抛出异常说明
java.lang.ClassCastException: org.apache.spark.ml.linalg.DenseVector cannot be cast to org.apache.spark.mllib.linalg.Vector
我的密码是
val spark = SparkSession.builder().master("local").getOrCreate()
val df = spark.range(0,10).withColumn("uniform" , rand(10L)).withColumn("normal1" , randn(10L)).withColumn("normal2" , randn(11L))
val assembler = new VectorAssembler().setInputCols(Array("uniform" ,"normal1","normal2")).setOutputCol("features")
val dfVec = assembler.transform(df)
val dfOutlier = dfVec.select("id" , "features").union( spark.createDataFrame(Seq( (10 , org.apache.spark.mllib.linalg.Vectors.dense(3,3,3)) )) )
dfOutlier.show(false)
val scaler = new StandardScaler().setInputCol("features").setOutputCol("Scaled").setWithStd(true).setWithMean(true)
val model = scaler.fit(dfOutlier).transform(dfOutlier)
model.show(false)
val dfVecRdd = model.select("Scaled").rdd.map(_(0).asInstanceOf[org.apache.spark.mllib.linalg.Vector] )
当我对 dfVecRdd
执行操作时,出现异常。我该如何解决这个问题?
尝试在您的代码中删除此导入
org.apache.spark.mllib.linalg.Vector
并导入这个
import org.apache.spark.ml.linalg.Vectors