在 CV 网格中设置 Spark xgBoost 模型的 scalePosWeight 参数

Setting the scalePosWeight parameter for the Spark xgBoost model in a CV grid

我正在尝试使用 Scala 在 Spark 上调整我的 xgBoost 模型。我的XGb参数网格如下:

val xgbParamGrid = (new ParamGridBuilder()
                .addGrid(xgb.maxDepth, Array(8, 16))
                .addGrid(xgb.minChildWeight, Array(0.5, 1, 2))
                .addGrid(xgb.alpha, Array(0.8, 0.9, 1))
                .addGrid(xgb.lambda, Array(0.8, 1, 2))
                .addGrid(xgb.scalePosWeight, Array(1, 5, 9))
                .addGrid(xgb.subSample, Array(0.5, 0.8, 1))
                .addGrid(xgb.eta, Array(0.01, 0.1, 0.3, 0.5))
                .build())

交叉验证器的调用如下:

val evaluator = (new BinaryClassificationEvaluator()
                      .setLabelCol("label")
                      .setRawPredictionCol("prediction")
                      .setMetricName("areaUnderPR"))

    val cv = (new CrossValidator()
              .setEstimator(pipeline_model_xgb)
              .setEvaluator(evaluator)
              .setEstimatorParamMaps(xgbParamGrid)
              .setNumFolds(10))

    val xgb_model = cv.fit(train)

我仅针对 scalePosWeight 参数收到以下错误:

error: type mismatch;
found   : org.apache.spark.ml.param.DoubleParam
required: org.apache.spark.ml.param.Param[AnyVal]
Note: Double <: AnyVal (and org.apache.spark.ml.param.DoubleParam <:                      

    org.apache.spark.ml.param.Param[Double]), but class Param is invariant in type T.
You may wish to define T as +T instead. (SLS 4.5)
                              .addGrid(xgb.scalePosWeight, Array(1, 5, 9))
                                           ^

根据我的搜索,消息 "You may wish to define T as +T instead" 很常见,但我不确定如何在此处解决此问题。感谢您的帮助!

我 运行 在为 minChildWeight 设置数组时遇到了同样的问题,并且该数组仅由 Int 类型组成。有效的解决方案(对于 scalePosWeight 和 minChildWeight)是传递一个浮点数组:

.addGrid(xgb.scalePosWeight, Array(1.0, 5.0, 9.0))