输出“0 个修剪节点”以外的内容

Question

每次我使用 xgboost（不仅是 python）时，训练消息的每一行总是包含“0 个修剪节点”。例如：

import pandas as pd
from sklearn import datasets
import xgboost as xgb
iris = datasets.load_iris()
dtrain = xgb.DMatrix(iris.data, label = iris.target)
params = {'max_depth': 10, 'min_child_weight': 0, 'gamma': 0, 'lambda': 0, 'alpha': 0}
bst = xgb.train(params, dtrain)

输出包括一长串语句，例如

[11:08:18] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 16 extra nodes, 0 pruned nodes, max_depth=5

我试过几种调整参数的组合，但我总是收到“0 个已修剪节点”消息。如何生成我得到一些修剪节点的情况？

Answer 1

您将使用正则化 修剪节点！使用gamma参数！

objective函数包含两部分：训练损失和正则化。 XGBoost 中的正则化由三个参数控制：alpha、lambda 和 gamma (doc)：

alpha [default=0] L1 regularization term on weights, increase this value will make model more conservative.

lambda [default=1] L2 regularization term on weights, increase this value will make model more conservative.

gamma [default=0] minimum loss reduction required to make a further partition on a leaf node of the tree. the larger, the more conservative the algorithm will be. range: [0,∞]

alpha 和 beta 只是权重的 L1 和 L2 惩罚，不应影响修剪。

但是 gamma 是调整以获取修剪节点的参数。您应该增加它以获得修剪的节点。请注意，它依赖于 objective 函数，并且它可能需要高达 10000 或更高的值才能获得修剪后的节点。调整伽玛很棒！它将使 XGBoost 收敛！这意味着经过一定次数的迭代后，训练和测试分数在接下来的迭代中不会改变（新树的所有节点都将被修剪）。最后是一个很好的控制过拟合的开关！

请参阅 Introduction to Boosted Trees 以获得 gamma 的确切定义。

输出“0 个修剪节点”以外的内容

Output something other than '0 pruned nodes'

python

xgboost