如何加载经过训练的 RandomForestClassificationModel 模型？

Question

我训练并测试了一个 ML 模型（GBTClassificationModel 或 RandomForestClassificationModel）。然后我想保存训练好的模型以备将来使用。所以我做了以下事情：

 model.save("...");

以保存后的GBTClassificationModel为例。保存的文件是一个包含 "data, metadata and treesMetadata" 的目录。我的问题是如何使用这个保存的模型以备将来使用？例如，我想做如下的事情：

 model = spark.load("...");
 Dataset<Row> predict_data= model_model.transform(dataset_test1)

有什么建议吗？谢谢。

更新：

结果很简单：

 GBTClassificationModel model1 = GBTClassificationModel.load("...");
 Dataset<Row> predict_data= model1.transform(dataset_test)

Answer 1

你应该使用 RandomForestClassificationModel.load 方法。

load(path: String): RandomForestClassificationModel Reads an ML instance from the input path, a shortcut of read.load(path).

在 Scala 中，您的情况如下：

import org.apache.spark.ml.classification.RandomForestClassificationModel
val model = RandomForestClassificationModel.load("/analytics_shared/qoe/km_model")

我强烈推荐使用 Spark MLlib 的 ML Pipeline 特性：

ML Pipelines provide a uniform set of high-level APIs built on top of DataFrames that help users create and tune practical machine learning pipelines.

有了 ML Pipeline，你只需将 RandomForestClassificationModel 替换为 PipelineModel。

就简单多了

import org.apache.spark.ml.PipelineModel
val model = PipelineModel.load("...")

如何加载经过训练的 RandomForestClassificationModel 模型？

How to load trained RandomForestClassificationModel model?

java

apache-spark

apache-spark-mllib