Using Spark ML's Logistic Regression model on MultiClass Classification giving error : Column prediction already exists
Using Spark ML's Logistic Regression model on MultiClass Classification giving error : Column prediction already exists
我正在使用 Spark ML 的逻辑回归模型来解决具有 100 个类别 (0-99) 的分类问题。我在数据集中的列是 - “_c0、_c1、_c2、_c3、_c4、_c5”
其中 _c5 是目标变量,其余是特征。我的代码如下:
import org.apache.spark.ml.feature.{StringIndexer, VectorAssembler}
import org.apache.spark.ml.classification.LogisticRegression
import org.apache.spark.ml.classification.OneVsRest
val _c0Indexer = new StringIndexer().setInputCol("_c0").setOutputCol("_c0Index")
val _c1Indexer = new StringIndexer().setInputCol("_c1").setOutputCol("_c1Index")
val _c2Indexer = new StringIndexer().setInputCol("_c2").setOutputCol("_c2Index")
val _c3Indexer = new StringIndexer().setInputCol("_c3").setOutputCol("_c3Index")
val _c4Indexer = new StringIndexer().setInputCol("_c4").setOutputCol("_c4Index")
val _c5Indexer = new StringIndexer().setInputCol("_c5").setOutputCol("_c5Index")
val assembler = new VectorAssembler().setInputCols(Array("_c0Index", "_c1Index", "_c2Index", "_c3Index","_c4Index")).setOutputCol("features")
val lr = new LogisticRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8).setLabelCol("_c5Index").setFeaturesCol("features")
val ovr = new OneVsRest().setClassifier(lr)
val pipeline = new Pipeline().setStages(Array(_c0Indexer, _c1Indexer, _c2Indexer, _c3Indexer, _c4Indexer,assembler, _c5Indexer, ovr,lr))
val model = pipeline.fit(data)
val predictions = model.transform(testdf)
println(predictions.select("features", "_c5Index", "probability","prediction").show(5))
但是显示错误:
requirement failed: Column prediction already exists.
有人可以指导我为什么会收到此错误吗?提前致谢。
尝试删除管道末端的 "lr"。我觉得没必要,因为ovr用的是lr.
我正在使用 Spark ML 的逻辑回归模型来解决具有 100 个类别 (0-99) 的分类问题。我在数据集中的列是 - “_c0、_c1、_c2、_c3、_c4、_c5” 其中 _c5 是目标变量,其余是特征。我的代码如下:
import org.apache.spark.ml.feature.{StringIndexer, VectorAssembler}
import org.apache.spark.ml.classification.LogisticRegression
import org.apache.spark.ml.classification.OneVsRest
val _c0Indexer = new StringIndexer().setInputCol("_c0").setOutputCol("_c0Index")
val _c1Indexer = new StringIndexer().setInputCol("_c1").setOutputCol("_c1Index")
val _c2Indexer = new StringIndexer().setInputCol("_c2").setOutputCol("_c2Index")
val _c3Indexer = new StringIndexer().setInputCol("_c3").setOutputCol("_c3Index")
val _c4Indexer = new StringIndexer().setInputCol("_c4").setOutputCol("_c4Index")
val _c5Indexer = new StringIndexer().setInputCol("_c5").setOutputCol("_c5Index")
val assembler = new VectorAssembler().setInputCols(Array("_c0Index", "_c1Index", "_c2Index", "_c3Index","_c4Index")).setOutputCol("features")
val lr = new LogisticRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8).setLabelCol("_c5Index").setFeaturesCol("features")
val ovr = new OneVsRest().setClassifier(lr)
val pipeline = new Pipeline().setStages(Array(_c0Indexer, _c1Indexer, _c2Indexer, _c3Indexer, _c4Indexer,assembler, _c5Indexer, ovr,lr))
val model = pipeline.fit(data)
val predictions = model.transform(testdf)
println(predictions.select("features", "_c5Index", "probability","prediction").show(5))
但是显示错误:
requirement failed: Column prediction already exists.
有人可以指导我为什么会收到此错误吗?提前致谢。
尝试删除管道末端的 "lr"。我觉得没必要,因为ovr用的是lr.