Scikit Learn 中不确定的随机森林文档

Unconclusive RandomForest documentation in ScikitLearn

在 Scikit-Learn http://scikit-learn.org/stable/modules/ensemble.html#id6 的集成方法文档中 1.9.2.3 部分。参数我们读到:

(...) The best results are also usually reached when setting max_depth=None in combination with min_samples_split=1 (i.e., when fully developing the trees). Bear in mind though that these values are usually not optimal. The best parameter values should always be cross- validated.

那么最佳结果和最佳结果有什么区别?我认为作者所说的最佳结果是指最佳交叉验证预测结果。

In addition, note that bootstrap samples are used by default in random forests (bootstrap=True) while the default strategy is to use the original dataset for building extra-trees (bootstrap=False).

我是这样理解的:在 Scikit-Learns 实现中默认使用自举,但默认策略是使用自举。如果是这样,那么默认策略的来源是什么?为什么它不是实现中的默认策略?

我同意第一句话是自相矛盾的。也许以下会更好:

The best results are also often reached with fully developed trees (max_depth=None and min_samples_split=1). Bear in mind though that these values are usually not guaranteed to be optimal. The best parameter values should always be cross-validated.

对于第二个引用,它将随机森林(RandomForestClassifierRandomForestRegression)的 bootstrap 参数的默认值与 [=23] 中实现的极端随机树进行比较=] ExtraTreesClassifierExtraTreesRegressor。以下内容可能更明确:

In addition, note that bootstrap samples are used by default in random forests (bootstrap=True) while for building extra-trees the default strategy is to use the original dataset (bootstrap=False).

如果您发现这些公式更易于理解,请随时提交包含修复的 PR。