如何从 Spark MLlib 计算的原始分数推断预测的 class 标签

Question

阅读下面的 Spark 文档

https://spark.apache.org/docs/latest/mllib-optimization.html

二进制class化预测的示例代码段如下：

    val model = new LogisticRegressionModel(
    Vectors.dense(weightsWithIntercept.toArray.slice(0,weightsWithIntercept.size - 1)),
    weightsWithIntercept(weightsWithIntercept.size - 1))

    // Clear the default threshold.
    model.clearThreshold()

   // Compute raw scores on the test set.
   val scoreAndLabels = test.map { point =>
   val score = model.predict(point.features)
   (score, point.label)

如您所见 model.prediction(point.features) return 原始分数，即超平面分离距离的边际。

我的问题是：

(1) 根据以上计算的原始分数，我如何知道预测 class 标签是 0 还是 1？

或

(2) 如何从上述计算的原始分数中推断出预测的 class 标签（0 或 1）？class

Answer 1

默认情况下，阈值为 0.5，因此当使用 BinaryClassificationMetrics 时，如果分数 < 0.5，它将给出 class 标签 0，如果更高，则为 1。所以你可以做同样的事情来从分数中推断出 class。

如何从 Spark MLlib 计算的原始分数推断预测的 class 标签

How to infer the predicted class label from Spark MLlib computed raw score

scala

apache-spark

apache-spark-ml

apache-spark-mllib