如何使用 if 语句重现 XGBOOST 拆分?
How to reproduce the XGBOOST splits with if statements?
我正在使用 xgboost
,我需要使用 if 语句和添加来重现它的输出。但是,我没有得到正确的输出。
让我们创建随机数据:
import numpy as np
import xgboost as xgb
import os
np.random.seed(42)
data = np.random.rand(100, 5)
label = np.random.randint(2, size=100)
dtrain = xgb.DMatrix(data, label=label)
然后创建一个基本模型:
param = {'max_depth': 2, 'eta': 1, 'objective': 'binary:logistic'}
num_round = 3
bst = xgb.train(param, dtrain, num_round)
现在我保存强化树的规则:
savefile = 'dump.raw.txt'
bst.dump_model(savefile)
os.startfile(savefile)
这是我得到的:
# booster[0]:
# 0:[f3<0.905868173] yes=1,no=2,missing=1
# 1:[f0<0.0309647173] yes=3,no=4,missing=3
# 3:leaf=0.5
# 4:leaf=-0.561797738
# 2:[f3<0.956529975] yes=5,no=6,missing=5
# 5:leaf=0.909090936
# 6:leaf=-0.5
# booster[1]:
# 0:[f2<0.863453388] yes=1,no=2,missing=1
# 1:[f2<0.71782589] yes=3,no=4,missing=3
# 3:leaf=-0.0658661202
# 4:leaf=1.03587329
# 2:[f0<0.345137954] yes=5,no=6,missing=5
# 5:leaf=0.0854885057
# 6:leaf=-1.15627134
# booster[2]:
# 0:[f2<0.46345675] yes=1,no=2,missing=1
# 1:[f2<0.18197903] yes=3,no=4,missing=3
# 3:leaf=-0.321362823
# 4:leaf=1.05848205
# 2:[f3<0.704104543] yes=5,no=6,missing=5
# 5:leaf=-0.623027325
# 6:leaf=0.46367079
我的测试集是这样的:
bst.predict(dtrain)[0]
array([0.37454012, 0.95071431, 0.73199394, 0.59865848, 0.15601864])
如果我总结拆分,我得到:
-0.5618 + 1.0358 - 0.6230 = -0.14899
应该是0.48283
如何获得正确的输出值?
How do I get the right output value?
您似乎在处理二元分类问题(0
和 1
的整数标签),因此您需要将 sigmoid function 应用于提升分数。
更精确地重新计算:
import numpy
x = -0.561797738 + 1.03587329 + -0.623027325
1. / (1. + numpy.exp(-x))
.. 产量 0.46283075
我正在使用 xgboost
,我需要使用 if 语句和添加来重现它的输出。但是,我没有得到正确的输出。
让我们创建随机数据:
import numpy as np
import xgboost as xgb
import os
np.random.seed(42)
data = np.random.rand(100, 5)
label = np.random.randint(2, size=100)
dtrain = xgb.DMatrix(data, label=label)
然后创建一个基本模型:
param = {'max_depth': 2, 'eta': 1, 'objective': 'binary:logistic'}
num_round = 3
bst = xgb.train(param, dtrain, num_round)
现在我保存强化树的规则:
savefile = 'dump.raw.txt'
bst.dump_model(savefile)
os.startfile(savefile)
这是我得到的:
# booster[0]:
# 0:[f3<0.905868173] yes=1,no=2,missing=1
# 1:[f0<0.0309647173] yes=3,no=4,missing=3
# 3:leaf=0.5
# 4:leaf=-0.561797738
# 2:[f3<0.956529975] yes=5,no=6,missing=5
# 5:leaf=0.909090936
# 6:leaf=-0.5
# booster[1]:
# 0:[f2<0.863453388] yes=1,no=2,missing=1
# 1:[f2<0.71782589] yes=3,no=4,missing=3
# 3:leaf=-0.0658661202
# 4:leaf=1.03587329
# 2:[f0<0.345137954] yes=5,no=6,missing=5
# 5:leaf=0.0854885057
# 6:leaf=-1.15627134
# booster[2]:
# 0:[f2<0.46345675] yes=1,no=2,missing=1
# 1:[f2<0.18197903] yes=3,no=4,missing=3
# 3:leaf=-0.321362823
# 4:leaf=1.05848205
# 2:[f3<0.704104543] yes=5,no=6,missing=5
# 5:leaf=-0.623027325
# 6:leaf=0.46367079
我的测试集是这样的:
bst.predict(dtrain)[0]
array([0.37454012, 0.95071431, 0.73199394, 0.59865848, 0.15601864])
如果我总结拆分,我得到:
-0.5618 + 1.0358 - 0.6230 = -0.14899
应该是0.48283
如何获得正确的输出值?
How do I get the right output value?
您似乎在处理二元分类问题(0
和 1
的整数标签),因此您需要将 sigmoid function 应用于提升分数。
更精确地重新计算:
import numpy
x = -0.561797738 + 1.03587329 + -0.623027325
1. / (1. + numpy.exp(-x))
.. 产量 0.46283075