了解 xgb.dump
Understanding xgb.dump
我试图理解交互深度为 1 的二元分类 xgb.dump 中发生的事情的直觉。具体来说,相同的拆分是如何连续使用两次的(f38 < 2.5)(代码行 2 和 6)
结果输出如下所示:
xgb.dump(model_2,with.stats=T)
[1] "booster[0]"
[2] "0:[f38<2.5] yes=1,no=2,missing=1,gain=173.793,cover=6317"
[3] "1:leaf=-0.0366182,cover=3279.75"
[4] "2:leaf=-0.0466305,cover=3037.25"
[5] "booster[1]"
[6] "0:[f38<2.5] yes=1,no=2,missing=1,gain=163.887,cover=6314.25"
[7] "1:leaf=-0.035532,cover=3278.65"
[8] "2:leaf=-0.0452568,cover=3035.6"
第一次使用f38和第二次使用f38的区别仅仅是残差拟合进行了吗?起初我觉得很奇怪,并试图准确理解这里发生了什么!
谢谢!
第一次使用f38和第二次使用f38的区别仅仅是残差拟合吗?
很可能是 - 它在第一轮后更新梯度并在您的示例中找到与分割点相同的特征
这是一个可重现的例子。
请注意我是如何降低第二个示例中的学习率的,它在所有三轮中都找到了相同的特征和相同的分割点。在第一个示例中,它在所有 3 轮中使用了不同的特征。
require(xgboost)
data(agaricus.train, package='xgboost')
train <- agaricus.train
dtrain <- xgb.DMatrix(data = train$data, label=train$label)
#high learning rate, finds different first split feature (f55,f28,f66) in each tree
bst <- xgboost(data = train$data, label = train$label, max_depth = 2, eta = 1, nrounds = 3,nthread = 2, objective = "binary:logistic")
xgb.dump(model = bst)
# [1] "booster[0]" "0:[f28<-9.53674e-07] yes=1,no=2,missing=1"
# [3] "1:[f55<-9.53674e-07] yes=3,no=4,missing=3" "3:leaf=1.71218"
# [5] "4:leaf=-1.70044" "2:[f108<-9.53674e-07] yes=5,no=6,missing=5"
# [7] "5:leaf=-1.94071" "6:leaf=1.85965"
# [9] "booster[1]" "0:[f59<-9.53674e-07] yes=1,no=2,missing=1"
# [11] "1:[f28<-9.53674e-07] yes=3,no=4,missing=3" "3:leaf=0.784718"
# [13] "4:leaf=-0.96853" "2:leaf=-6.23624"
# [15] "booster[2]" "0:[f101<-9.53674e-07] yes=1,no=2,missing=1"
# [17] "1:[f66<-9.53674e-07] yes=3,no=4,missing=3" "3:leaf=0.658725"
# [19] "4:leaf=5.77229" "2:[f110<-9.53674e-07] yes=5,no=6,missing=5"
# [21] "5:leaf=-0.791407" "6:leaf=-9.42142"
## changed eta to lower learning rate, finds same feature(f55) in first split of each tree
bst2 <- xgboost(data = train$data, label = train$label, max_depth = 2, eta = .01, nrounds = 3,nthread = 2, objective = "binary:logistic")
xgb.dump(model = bst2)
# [1] "booster[0]" "0:[f28<-9.53674e-07] yes=1,no=2,missing=1"
# [3] "1:[f55<-9.53674e-07] yes=3,no=4,missing=3" "3:leaf=0.0171218"
# [5] "4:leaf=-0.0170044" "2:[f108<-9.53674e-07] yes=5,no=6,missing=5"
# [7] "5:leaf=-0.0194071" "6:leaf=0.0185965"
# [9] "booster[1]" "0:[f28<-9.53674e-07] yes=1,no=2,missing=1"
# [11] "1:[f55<-9.53674e-07] yes=3,no=4,missing=3" "3:leaf=0.016952"
# [13] "4:leaf=-0.0168371" "2:[f108<-9.53674e-07] yes=5,no=6,missing=5"
# [15] "5:leaf=-0.0192151" "6:leaf=0.0184251"
# [17] "booster[2]" "0:[f28<-9.53674e-07] yes=1,no=2,missing=1"
# [19] "1:[f55<-9.53674e-07] yes=3,no=4,missing=3" "3:leaf=0.0167863"
# [21] "4:leaf=-0.0166737" "2:[f108<-9.53674e-07] yes=5,no=6,missing=5"
# [23] "5:leaf=-0.0190286" "6:leaf=0.0182581"
我试图理解交互深度为 1 的二元分类 xgb.dump 中发生的事情的直觉。具体来说,相同的拆分是如何连续使用两次的(f38 < 2.5)(代码行 2 和 6)
结果输出如下所示:
xgb.dump(model_2,with.stats=T)
[1] "booster[0]"
[2] "0:[f38<2.5] yes=1,no=2,missing=1,gain=173.793,cover=6317"
[3] "1:leaf=-0.0366182,cover=3279.75"
[4] "2:leaf=-0.0466305,cover=3037.25"
[5] "booster[1]"
[6] "0:[f38<2.5] yes=1,no=2,missing=1,gain=163.887,cover=6314.25"
[7] "1:leaf=-0.035532,cover=3278.65"
[8] "2:leaf=-0.0452568,cover=3035.6"
第一次使用f38和第二次使用f38的区别仅仅是残差拟合进行了吗?起初我觉得很奇怪,并试图准确理解这里发生了什么!
谢谢!
第一次使用f38和第二次使用f38的区别仅仅是残差拟合吗?
很可能是 - 它在第一轮后更新梯度并在您的示例中找到与分割点相同的特征
这是一个可重现的例子。
请注意我是如何降低第二个示例中的学习率的,它在所有三轮中都找到了相同的特征和相同的分割点。在第一个示例中,它在所有 3 轮中使用了不同的特征。
require(xgboost)
data(agaricus.train, package='xgboost')
train <- agaricus.train
dtrain <- xgb.DMatrix(data = train$data, label=train$label)
#high learning rate, finds different first split feature (f55,f28,f66) in each tree
bst <- xgboost(data = train$data, label = train$label, max_depth = 2, eta = 1, nrounds = 3,nthread = 2, objective = "binary:logistic")
xgb.dump(model = bst)
# [1] "booster[0]" "0:[f28<-9.53674e-07] yes=1,no=2,missing=1"
# [3] "1:[f55<-9.53674e-07] yes=3,no=4,missing=3" "3:leaf=1.71218"
# [5] "4:leaf=-1.70044" "2:[f108<-9.53674e-07] yes=5,no=6,missing=5"
# [7] "5:leaf=-1.94071" "6:leaf=1.85965"
# [9] "booster[1]" "0:[f59<-9.53674e-07] yes=1,no=2,missing=1"
# [11] "1:[f28<-9.53674e-07] yes=3,no=4,missing=3" "3:leaf=0.784718"
# [13] "4:leaf=-0.96853" "2:leaf=-6.23624"
# [15] "booster[2]" "0:[f101<-9.53674e-07] yes=1,no=2,missing=1"
# [17] "1:[f66<-9.53674e-07] yes=3,no=4,missing=3" "3:leaf=0.658725"
# [19] "4:leaf=5.77229" "2:[f110<-9.53674e-07] yes=5,no=6,missing=5"
# [21] "5:leaf=-0.791407" "6:leaf=-9.42142"
## changed eta to lower learning rate, finds same feature(f55) in first split of each tree
bst2 <- xgboost(data = train$data, label = train$label, max_depth = 2, eta = .01, nrounds = 3,nthread = 2, objective = "binary:logistic")
xgb.dump(model = bst2)
# [1] "booster[0]" "0:[f28<-9.53674e-07] yes=1,no=2,missing=1"
# [3] "1:[f55<-9.53674e-07] yes=3,no=4,missing=3" "3:leaf=0.0171218"
# [5] "4:leaf=-0.0170044" "2:[f108<-9.53674e-07] yes=5,no=6,missing=5"
# [7] "5:leaf=-0.0194071" "6:leaf=0.0185965"
# [9] "booster[1]" "0:[f28<-9.53674e-07] yes=1,no=2,missing=1"
# [11] "1:[f55<-9.53674e-07] yes=3,no=4,missing=3" "3:leaf=0.016952"
# [13] "4:leaf=-0.0168371" "2:[f108<-9.53674e-07] yes=5,no=6,missing=5"
# [15] "5:leaf=-0.0192151" "6:leaf=0.0184251"
# [17] "booster[2]" "0:[f28<-9.53674e-07] yes=1,no=2,missing=1"
# [19] "1:[f55<-9.53674e-07] yes=3,no=4,missing=3" "3:leaf=0.0167863"
# [21] "4:leaf=-0.0166737" "2:[f108<-9.53674e-07] yes=5,no=6,missing=5"
# [23] "5:leaf=-0.0190286" "6:leaf=0.0182581"