Spark ML 决策树分类器调用随机森林方法

Question

我现在运行以下代码，使用 ML 库中的分类器：

val decisionTree = new DecisionTreeClassifier().setLabelCol("label").setFeaturesCol("features").setMaxDepth(7).setImpurity("gini")
val model = decisionTree.fit(df3)
val prediction = model.transform(df3)

当我查看 Spark 历史记录时，我看到了以下内容：

为什么我的单个决策树使用 randomForest 方法，是我做错了什么吗？还有为什么有些任务与其他任务相比真的很长？（如果我可以做些什么来加快速度，我想知道）

ML 文档没有提供太多关于此的信息...

Answer 1

Random forests are ensembles of decision trees

所以这和决策树是一样的。如果您将最大深度从 7 调整为 1，它会花费更少的时间，但您会欠拟合。它也基于内存大小。

Spark ML 决策树分类器调用随机森林方法

Spark ML decisiontree classifier calls random forest methods

scala

machine-learning

apache-spark

apache-spark-mllib