我们可以在 XGBoost 中设置每片叶子的最小样本数吗（就像在其他 GBM 算法中一样）？

Can we set minimum samples per leaf in XGBoost (like in other GBM algos)?

我很好奇为什么在 sklearn 中使用 xgBoost doesn't support the min_samples_leaf parameter like the classic GB 分类器？如果我确实想控制最小值。单个叶子上的样本数，xgboost 中是否有任何解决方法？

xgboost有min_child_weight，但在普通的回归任务之外确实与最小样本不同。我不能说 为什么 不包括附加参数。请注意，尽管在二进制分类中，logloss hessian 是 p(1-p) 并且介于 0 和 1/4 之间，对于非常有信心的预测，其值接近零；所以实际上设置 min_child_weight 需要每个叶子中有许多当前不确定的行，这可能足够接近（或优于！）设置最小行数。

您可以尝试使用 min_child_weight。根据文档，这个参数：

minimum sum of instance weight (hessian) needed in a child.

对于具有MSE损失的回归问题，实例权重之和将导致每个叶节点的样本最少，因为MSE损失的二阶导数等于1。

对于class化问题，它会产生不同的度量来表征叶节点中样本的纯度（例如，对于二进制class化，如果样本的比例一个 class 严重支配叶子中的另一个 class——没有必要进一步拆分它）。

我不知道没有 min_samples_leaf 参数的具体原因。我猜它对min_child_weight的干扰会给用户带来一些设计上的挑战和困惑。

我们可以在 XGBoost 中设置每片叶子的最小样本数吗（就像在其他 GBM 算法中一样）？

Can we set minimum samples per leaf in XGBoost (like in other GBM algos)?

python

scikit-learn

xgboost