Scikit Learn 中不确定的随机森林文档

Unconclusive RandomForest documentation in ScikitLearn

在 Scikit-Learn http://scikit-learn.org/stable/modules/ensemble.html#id6 的集成方法文档中 1.9.2.3 部分。参数我们读到：

(...) The best results are also usually reached when setting max_depth=None in combination with min_samples_split=1 (i.e., when fully developing the trees). Bear in mind though that these values are usually not optimal. The best parameter values should always be cross- validated.

那么最佳结果和最佳结果有什么区别？我认为作者所说的最佳结果是指最佳交叉验证预测结果。

In addition, note that bootstrap samples are used by default in random forests (bootstrap=True) while the default strategy is to use the original dataset for building extra-trees (bootstrap=False).

我是这样理解的：在 Scikit-Learns 实现中默认使用自举，但默认策略是不使用自举。如果是这样，那么默认策略的来源是什么？为什么它不是实现中的默认策略？

我同意第一句话是自相矛盾的。也许以下会更好：

The best results are also often reached with fully developed trees (max_depth=None and min_samples_split=1). Bear in mind though that these values are usually not guaranteed to be optimal. The best parameter values should always be cross-validated.

对于第二个引用，它将随机森林（RandomForestClassifier 和 RandomForestRegression）的 bootstrap 参数的默认值与 [=23] 中实现的极端随机树进行比较=] ExtraTreesClassifier 和 ExtraTreesRegressor。以下内容可能更明确：

In addition, note that bootstrap samples are used by default in random forests (bootstrap=True) while for building extra-trees the default strategy is to use the original dataset (bootstrap=False).

如果您发现这些公式更易于理解，请随时提交包含修复的 PR。

Scikit Learn 中不确定的随机森林文档

Unconclusive RandomForest documentation in ScikitLearn

python

random-forest

scikit-learn