Scala java.io 将特征重要性向量压缩到列名数组时出现 toArray 错误

Question

当尝试将特征重要性向量从 lightGBM getfeatureImportances 压缩到列名数组时，我运行出现以下错误：

import com.microsoft.ml.spark.LightGBMClassificationModel
import org.apache.spark.ml.classification.RandomForestClassificationModel

def getFeatureImportances(inputContainer: PipelineModelContainer): (String, String) = {
    val transformer = inputContainer.pipelineModel.stages.last

    val featureImportancesVector = inputContainer.params match {
        case RandomForestParameters(numTrees, treeDepth, featureTransformer) =>
            transformer.asInstanceOf[RandomForestClassificationModel].featureImportances
        case LightGBMParameters(treeDepth, numLeaves, iterations, featureTransformer) => 
            transformer.asInstanceOf[LightGBMClassificationModel].getFeatureImportances("split")
    }

    val colNames = inputContainer.featureColNames
    val sortedFeatures = (colNames zip featureImportancesVector.toArray).sortWith(_._2 > _._2).zipWithIndex
}

我在参考代码的最后一行时遇到此错误：

value toArray is not a member of java.io.Serializable

似乎无法将轻型 GBM 特征重要性运行转换为数组。如果它只是运行domForestClassifier 功能重要性，则此代码可以正常工作。我还能做什么？

Answer 1

在match块的两个分支中，一个 returns Array[Double], 另一个 returns Vector.

两种类型的共同超类型是java.io.Serializable, 所以 Scala 推断出变量 featureImportancesVector 的类型。 toArray 在该类型中不可用，尽管在这两种情况下都存在该方法。

要解决这个问题很容易，正如评论中所建议的那样，将 .toArray 移动到 featureImportances，这样两个分支的类型，以及变量的类型，就变成了 Array[Double].

Scala java.io 将特征重要性向量压缩到列名数组时出现 toArray 错误

Scala java.io toArray error when zipping feature importance vector to column names array

java

scala

java-io

apache-spark

lightgbm