Scikit Learn 中不确定的随机森林文档
Unconclusive RandomForest documentation in ScikitLearn
在 Scikit-Learn http://scikit-learn.org/stable/modules/ensemble.html#id6 的集成方法文档中 1.9.2.3 部分。参数我们读到:
(...) The best results are also usually reached when setting max_depth=None in combination with min_samples_split=1 (i.e., when fully developing the trees). Bear in mind though that these values are usually not optimal. The best parameter values should always be cross- validated.
那么最佳结果和最佳结果有什么区别?我认为作者所说的最佳结果是指最佳交叉验证预测结果。
In addition, note that bootstrap samples are used by default in random forests (bootstrap=True) while the default strategy is to use the original dataset for building extra-trees (bootstrap=False).
我是这样理解的:在 Scikit-Learns 实现中默认使用自举,但默认策略是不使用自举。如果是这样,那么默认策略的来源是什么?为什么它不是实现中的默认策略?
我同意第一句话是自相矛盾的。也许以下会更好:
The best results are also often reached with fully developed trees (max_depth=None and min_samples_split=1). Bear in mind though that these values are usually not guaranteed to be optimal. The best parameter values should always be cross-validated.
对于第二个引用,它将随机森林(RandomForestClassifier
和 RandomForestRegression
)的 bootstrap
参数的默认值与 [=23] 中实现的极端随机树进行比较=] ExtraTreesClassifier
和 ExtraTreesRegressor
。以下内容可能更明确:
In addition, note that bootstrap samples are used by default in random forests (bootstrap=True) while for building extra-trees the default strategy is to use the original dataset (bootstrap=False).
如果您发现这些公式更易于理解,请随时提交包含修复的 PR。
在 Scikit-Learn http://scikit-learn.org/stable/modules/ensemble.html#id6 的集成方法文档中 1.9.2.3 部分。参数我们读到:
(...) The best results are also usually reached when setting max_depth=None in combination with min_samples_split=1 (i.e., when fully developing the trees). Bear in mind though that these values are usually not optimal. The best parameter values should always be cross- validated.
那么最佳结果和最佳结果有什么区别?我认为作者所说的最佳结果是指最佳交叉验证预测结果。
In addition, note that bootstrap samples are used by default in random forests (bootstrap=True) while the default strategy is to use the original dataset for building extra-trees (bootstrap=False).
我是这样理解的:在 Scikit-Learns 实现中默认使用自举,但默认策略是不使用自举。如果是这样,那么默认策略的来源是什么?为什么它不是实现中的默认策略?
我同意第一句话是自相矛盾的。也许以下会更好:
The best results are also often reached with fully developed trees (max_depth=None and min_samples_split=1). Bear in mind though that these values are usually not guaranteed to be optimal. The best parameter values should always be cross-validated.
对于第二个引用,它将随机森林(RandomForestClassifier
和 RandomForestRegression
)的 bootstrap
参数的默认值与 [=23] 中实现的极端随机树进行比较=] ExtraTreesClassifier
和 ExtraTreesRegressor
。以下内容可能更明确:
In addition, note that bootstrap samples are used by default in random forests (bootstrap=True) while for building extra-trees the default strategy is to use the original dataset (bootstrap=False).
如果您发现这些公式更易于理解,请随时提交包含修复的 PR。