为什么使用伪 Huber 损失 return 训练 Xgboost 模型是一个恒定的测试指标?

Why does training Xgboost model with pseudo-Huber loss return a constant test metric?

我正在尝试使用原生伪 Huber 损失拟合 xgboost 模型 reg:pseudohubererror。但是,它似乎没有用,因为训练和测试错误都没有改善。它适用于 reg:squarederror。我错过了什么?

代码:

library(xgboost)
n = 1000
X = cbind(runif(n,10,20), runif(n,0,10))
y = X %*% c(2,3) + rnorm(n,0,1)

train = xgb.DMatrix(data  = X[-n,],
                    label = y[-n])

test = xgb.DMatrix(data   = t(as.matrix(X[n,])),
                   label = y[n]) 

watchlist = list(train = train, test = test)

xbg_test = xgb.train(data = train, objective = "reg:pseudohubererror", eval_metric = "mae", watchlist = watchlist, gamma = 1, eta = 0.01, nrounds = 10000, early_stopping_rounds = 100)

结果:

[1] train-mae:44.372692 test-mae:33.085709 
Multiple eval metrics are present. Will use test_mae for early stopping.
Will train until test_mae hasn't improved in 100 rounds.

[2] train-mae:44.372692 test-mae:33.085709 
[3] train-mae:44.372688 test-mae:33.085709 
[4] train-mae:44.372688 test-mae:33.085709 
[5] train-mae:44.372688 test-mae:33.085709 
[6] train-mae:44.372688 test-mae:33.085709 
[7] train-mae:44.372688 test-mae:33.085709 
[8] train-mae:44.372688 test-mae:33.085709 
[9] train-mae:44.372688 test-mae:33.085709 
[10]    train-mae:44.372692 test-mae:33.085709 

这似乎是 pseudohuber 损失的预期行为。在这里,我硬编码了 objective 损失函数的一阶和二阶导数 found here 并通过 obj=obje 参数输入。如果您 运行 它并与 objective="reg:pseudohubererror" 版本进行比较,您会发现它们是相同的。至于为什么比squared loss差那么多,不清楚。

set.seed(20)

obje=function(pred, dData) {
  labels=getinfo(dData, "label")
  a=pred
  d=labels
  fir=a^2/sqrt(a^2/d^2+1)/d-2*d*(sqrt(a^2/d^2+1)-1)
  sec=((2*(a^2/d^2+1)^(3/2)-2)*d^2-3*a^2)/((a^2/d^2+1)^(3/2)*d^2)
  return (list(grad=fir, hess=sec))
}

xbg_test = xgb.train(data = train, obj=obje, eval_metric = "mae", watchlist = watchlist, gamma = 1, eta = 0.01, nrounds = 10000, early_stopping_rounds = 100)

我对问题的“为什么”部分没有答案,但遇到了同样的问题并找到了适合我的解决方案。

在我的问题中,算法仅在我对标签应用标准化后才开始收敛:

Label_standard = (Label - mean(Label)) / sd(Label)

注意仅从训练中计算均值和标准差,不包括测试数据集!

训练模型并生成预测后,您需要使用从训练数据集计算出的均值和标准差将标准化预测转换回原始范围。

我有这个想法是因为我发现当标签值“大”时算法没有转换。我也有兴趣了解“为什么”。