如何使用 if 语句重现 XGBOOST 拆分？

Question

我正在使用 xgboost，我需要使用 if 语句和添加来重现它的输出。但是，我没有得到正确的输出。

让我们创建随机数据：

import numpy as np
import xgboost as xgb
import os

np.random.seed(42)
data = np.random.rand(100, 5)
label = np.random.randint(2, size=100)
dtrain = xgb.DMatrix(data, label=label)

然后创建一个基本模型：

param = {'max_depth': 2, 'eta': 1, 'objective': 'binary:logistic'}

num_round = 3
bst = xgb.train(param, dtrain, num_round)

现在我保存强化树的规则：

savefile = 'dump.raw.txt'
bst.dump_model(savefile)
os.startfile(savefile)

这是我得到的：

# booster[0]:
# 0:[f3<0.905868173] yes=1,no=2,missing=1
#   1:[f0<0.0309647173] yes=3,no=4,missing=3
#       3:leaf=0.5
#       4:leaf=-0.561797738
#   2:[f3<0.956529975] yes=5,no=6,missing=5
#       5:leaf=0.909090936
#       6:leaf=-0.5
# booster[1]:
# 0:[f2<0.863453388] yes=1,no=2,missing=1
#   1:[f2<0.71782589] yes=3,no=4,missing=3
#       3:leaf=-0.0658661202
#       4:leaf=1.03587329
#   2:[f0<0.345137954] yes=5,no=6,missing=5
#       5:leaf=0.0854885057
#       6:leaf=-1.15627134
# booster[2]:
# 0:[f2<0.46345675] yes=1,no=2,missing=1
#   1:[f2<0.18197903] yes=3,no=4,missing=3
#       3:leaf=-0.321362823
#       4:leaf=1.05848205
#   2:[f3<0.704104543] yes=5,no=6,missing=5
#       5:leaf=-0.623027325
#       6:leaf=0.46367079

我的测试集是这样的：

bst.predict(dtrain)[0]

array([0.37454012, 0.95071431, 0.73199394, 0.59865848, 0.15601864])

如果我总结拆分，我得到：

-0.5618 + 1.0358 - 0.6230 = -0.14899

应该是0.48283

如何获得正确的输出值？

Answer 1

How do I get the right output value?

您似乎在处理二元分类问题（0 和 1 的整数标签），因此您需要将 sigmoid function 应用于提升分数。

更精确地重新计算：

import numpy
x = -0.561797738 + 1.03587329 + -0.623027325
1. / (1. + numpy.exp(-x))

.. 产量 0.46283075

如何使用 if 语句重现 XGBOOST 拆分？

How to reproduce the XGBOOST splits with if statements?

python

numpy

xgboost