无法使用 spark 保存 XGBoost 模型

Unable to save XGBoost model with spark

我正在训练并保存 XGBoost 模型,如下所示:

XGBoost Version 0.82

Spark Version 2.4.2

获取模型(调用训练函数)

def getModel(trainingData: DataFrame): PipelineModel = {
    val pipelineModel = train(trainingData)

    if (modelPathToSave != "") {
      pipelineModel.write.overwrite().save(modelPathToSave)
      println(f"Saved model to $modelPathToSave")
    }
    pipelineModel
    }

火车模型

def train(trainingData: DataFrame): PipelineModel = {
    val nh = new NullHandler()
      .setCols(hackyEncode(featureList))
      .setMethod("fill")

    val va = new VectorAssembler()
      .setInputCols(hackyDecode(nh.getCols).toArray)
      .setOutputCol(featuresCol)

    val xgb = new XGBoostClassifier()
      .setLabelCol("label")
      .setFeaturesCol("features")
      .setEta(0.3)
      .setMaxDepth(8)
      .setObjective("binary:logistic")
      .setEvalMetric("auc")  
      .setScalePosWeight(9)

    val pipeline = new Pipeline()
        .setStages(Array[PipelineStage](nh, va, xgb))

    pipeline.fit(trainingData)
}

但是我得到了这个错误:

Exception in thread "main" java.lang.NoSuchMethodError: shaded.json4s.jackson.JsonMethods$.parse(Lshaded/json4s/JsonInput;Z)Lshaded/json4s/JsonAST$JValue;
    at ml.dmlc.xgboost4j.scala.spark.params.DefaultXGBoostParamsWriter$$anonfun$$anonfun.apply(DefaultXGBoostParamsWriter.scala:73)
    at ml.dmlc.xgboost4j.scala.spark.params.DefaultXGBoostParamsWriter$$anonfun$$anonfun.apply(DefaultXGBoostParamsWriter.scala:71)

尽管我的 build.sbt 文件中有 json4s

  "org.json4s" %% "json4s-native" % "3.5.1",
  "org.json4s" %% "json4s-jackson" % "3.6.6", 

有人可以帮忙吗?

Xgboost 版本 0.82 与 Spark 2.4 不兼容。您可以降级到 Spark 2.3 或使用 Xgboost 版本 0.90。

参考:

https://discuss.xgboost.ai/t/xgboost-0-8-2-and-spark-2-4-0-unable-to-save-pipeline-model-into-aws-s3/838