python returns 错误中 lightGBM 的自定义 multi-class log-loss 函数
Custom multi-class log-loss function for lightGBM in python returns error
我正在尝试使用自定义 objective 函数实现 lightGBM classifier。我的目标数据有四个 classes,我的数据被分成 12 个观测值的自然组。
自定义 objective 函数实现两件事:
- 预测的模型输出必须是概率性的,并且每次观察的概率总和必须为 1。这也称为 softmax objective 函数,实现起来相对简单
- 每组 class 的概率之和必须为 1。这已在二项式 classification space 中实现,被称为条件 logit 模型。
总而言之,对于每组(在我的例子中是 4 个观察值),每一列和每一行的概率总和应为 1。我已经编写了一个有点笨拙的函数来实现这一点,但是当我尝试 运行 我的自定义 objective 函数在 python 的 xgb 框架内时,我收到以下错误:
TypeError: cannot unpack non-iterable numpy.float64 object
我的完整代码如下:
import lightgbm as lgb
import numpy as np
import pandas as pd
def standardiseProbs(preds, groupSize, eta = 0.1, maxIter = 100):
# add groupId to preds dataframe
n = preds.shape[0]
if n % groupSize != 0:
print('The selected group size paramter is not compatible with the data')
preds['groupId'] = np.repeat(np.arange(0, int(n/groupSize)), groupSize)
#initialise variables
error = 10000
i = 0
# perform loop while error exceeds set threshold (subject to maxIter)
while error > eta and i<maxIter:
i += 1
# get sum of probabilities by game
byGroup = preds.groupby('groupId')[0, 1, 2, 3].sum().reset_index()
byGroup.columns = ['groupId', '0G', '1G', '2G', '3G']
if '3G' in list(preds.columns):
preds = preds.drop(['3G', '2G', '1G', '0G'], axis=1)
preds = preds.merge(byGroup, how='inner', on='groupId')
# adjust probs to be consistent across a game
for v in [1, 2, 3]:
preds[v] = preds[v] / preds[str(v) + 'G']
preds[0] = (groupSize-3)* (preds[0] / preds['0G'])
# sum probabilities by player
preds['rowSum'] = preds[3] + preds[2] + preds[1] + preds[0]
# adjust probs to be consistent across a player
for v in [0, 1, 2, 3]:
preds[v] = preds[v] / preds['rowSum']
# get sum of probabilities by game
byGroup = preds.groupby('groupId')[0, 1, 2, 3].sum().reset_index()
byGroup.columns = ['groupId', '0G', '1G', '2G', '3G']
# calc error
errMat = abs(np.subtract(byGroup[['0G', '1G', '2G', '3G']].values, np.array([(groupSize-3), 1, 1, 1])))
error = sum(sum(errMat))
preds = preds[['groupId', 0, 1, 2, 3]]
return preds
def condObjective(preds, train):
labels = train.get_label()
preds = pd.DataFrame(np.reshape(preds, (int(preds.shape[0]/4), 4), order='C'), columns=[0,1,2,3])
n = preds.shape[0]
yy = np.zeros((n, 4))
yy[np.arange(n), labels] = 1
preds['matchId'] = np.repeat(np.arange(0, int(n/4)), 4)
preds = preds[['matchId', 0, 1, 2, 3]]
preds = standardiseProbs(preds, groupSize = 4, eta=0.001, maxIter=500)
preds = preds[[0, 1, 2, 3]].values
grad = (preds - yy).flatten()
hess = (preds * (1. - preds)).flatten()
return grad, hess
def mlogloss(preds, train):
labels = train.get_label()
preds = pd.DataFrame(np.reshape(preds, (int(preds.shape[0]/4), 4), order='C'), columns=[0,1,2,3])
n = preds.shape[0]
yy = np.zeros((n, 4))
yy[np.arange(n), labels] = 1
preds['matchId'] = np.repeat(np.arange(0, int(n/4)), 4)
preds = preds[['matchId', 0, 1, 2, 3]]
preds = standardiseProbs(preds, groupSize = 4, eta=0.001, maxIter=500)
preds = preds[[0, 1, 2, 3]].values
loss = -(np.sum(yy*np.log(preds)+(1-yy)*np.log(1-preds))/n)
return loss
n, k = 880, 5
xtrain = np.random.rand(n, k)
ytrain = np.random.randint(low=0, high=2, size=n)
ltrain = lgb.Dataset(xtrain, label=ytrain)
xtest = np.random.rand(int(n/2), k)
ytest = np.random.randint(low=0, high=2, size=int(n/2))
ltest = lgb.Dataset(xtrain, label=ytrain)
lgbmParams = {'boosting_type': 'gbdt',
'num_leaves': 250,
'max_depth': 3,
'min_data_in_leaf': 10,
'min_gain_to_split': 0.75,
'learning_rate': 0.01,
'subsample_for_bin': 120100,
'min_child_samples': 70,
'reg_alpha': 1.45,
'reg_lambda': 2.5,
'feature_fraction': 0.45,
'bagging_fraction': 0.55,
'is_unbalance': True,
'objective': 'multiclass',
'num_class': 4,
'metric': 'multi_logloss',
'verbose': 1}
lgbmModel = lgb.train(lgbmParams, ltrain, valid_sets=ltest,fobj=condObjective, feval=mlogloss, num_boost_round=5000, early_stopping_rounds=100, verbose_eval=50)
假设没有更好的方法来强制我的预测符合我施加的限制条件,我需要做什么才能使自定义 objective 起作用?
这个错误的问题
-> 2380 eval_name, val, is_higher_better = feval_ret // this is the return of mlogloss
2381 ret.append((data_name, eval_name, val, is_higher_better))
2382 return ret
TypeError: 'numpy.float64' object is not iterable
来自函数mlogloss()
。因为您将它用作 eval 函数 feval=mlogloss
它应该 return 3 件事:它的名称、值和一个布尔值,指示值越高越好。
def mlogloss(...):
...
return "my_loss_name", loss_value, False
需要两个函数,用于训练和验证,其中训练自定义损失(lgb.train 参数中的 feval)我们需要“grad,hess”作为 return,而我们需要 grad, hess,Boolean as a return,其中布尔值表示损失值越高越好。
看看这个,不是我的博客:https://maxhalford.github.io/blog/lightgbm-focal-loss/#lightgbm-custom-loss-function-caveats
我正在尝试使用自定义 objective 函数实现 lightGBM classifier。我的目标数据有四个 classes,我的数据被分成 12 个观测值的自然组。
自定义 objective 函数实现两件事:
- 预测的模型输出必须是概率性的,并且每次观察的概率总和必须为 1。这也称为 softmax objective 函数,实现起来相对简单
- 每组 class 的概率之和必须为 1。这已在二项式 classification space 中实现,被称为条件 logit 模型。
总而言之,对于每组(在我的例子中是 4 个观察值),每一列和每一行的概率总和应为 1。我已经编写了一个有点笨拙的函数来实现这一点,但是当我尝试 运行 我的自定义 objective 函数在 python 的 xgb 框架内时,我收到以下错误:
TypeError: cannot unpack non-iterable numpy.float64 object
我的完整代码如下:
import lightgbm as lgb
import numpy as np
import pandas as pd
def standardiseProbs(preds, groupSize, eta = 0.1, maxIter = 100):
# add groupId to preds dataframe
n = preds.shape[0]
if n % groupSize != 0:
print('The selected group size paramter is not compatible with the data')
preds['groupId'] = np.repeat(np.arange(0, int(n/groupSize)), groupSize)
#initialise variables
error = 10000
i = 0
# perform loop while error exceeds set threshold (subject to maxIter)
while error > eta and i<maxIter:
i += 1
# get sum of probabilities by game
byGroup = preds.groupby('groupId')[0, 1, 2, 3].sum().reset_index()
byGroup.columns = ['groupId', '0G', '1G', '2G', '3G']
if '3G' in list(preds.columns):
preds = preds.drop(['3G', '2G', '1G', '0G'], axis=1)
preds = preds.merge(byGroup, how='inner', on='groupId')
# adjust probs to be consistent across a game
for v in [1, 2, 3]:
preds[v] = preds[v] / preds[str(v) + 'G']
preds[0] = (groupSize-3)* (preds[0] / preds['0G'])
# sum probabilities by player
preds['rowSum'] = preds[3] + preds[2] + preds[1] + preds[0]
# adjust probs to be consistent across a player
for v in [0, 1, 2, 3]:
preds[v] = preds[v] / preds['rowSum']
# get sum of probabilities by game
byGroup = preds.groupby('groupId')[0, 1, 2, 3].sum().reset_index()
byGroup.columns = ['groupId', '0G', '1G', '2G', '3G']
# calc error
errMat = abs(np.subtract(byGroup[['0G', '1G', '2G', '3G']].values, np.array([(groupSize-3), 1, 1, 1])))
error = sum(sum(errMat))
preds = preds[['groupId', 0, 1, 2, 3]]
return preds
def condObjective(preds, train):
labels = train.get_label()
preds = pd.DataFrame(np.reshape(preds, (int(preds.shape[0]/4), 4), order='C'), columns=[0,1,2,3])
n = preds.shape[0]
yy = np.zeros((n, 4))
yy[np.arange(n), labels] = 1
preds['matchId'] = np.repeat(np.arange(0, int(n/4)), 4)
preds = preds[['matchId', 0, 1, 2, 3]]
preds = standardiseProbs(preds, groupSize = 4, eta=0.001, maxIter=500)
preds = preds[[0, 1, 2, 3]].values
grad = (preds - yy).flatten()
hess = (preds * (1. - preds)).flatten()
return grad, hess
def mlogloss(preds, train):
labels = train.get_label()
preds = pd.DataFrame(np.reshape(preds, (int(preds.shape[0]/4), 4), order='C'), columns=[0,1,2,3])
n = preds.shape[0]
yy = np.zeros((n, 4))
yy[np.arange(n), labels] = 1
preds['matchId'] = np.repeat(np.arange(0, int(n/4)), 4)
preds = preds[['matchId', 0, 1, 2, 3]]
preds = standardiseProbs(preds, groupSize = 4, eta=0.001, maxIter=500)
preds = preds[[0, 1, 2, 3]].values
loss = -(np.sum(yy*np.log(preds)+(1-yy)*np.log(1-preds))/n)
return loss
n, k = 880, 5
xtrain = np.random.rand(n, k)
ytrain = np.random.randint(low=0, high=2, size=n)
ltrain = lgb.Dataset(xtrain, label=ytrain)
xtest = np.random.rand(int(n/2), k)
ytest = np.random.randint(low=0, high=2, size=int(n/2))
ltest = lgb.Dataset(xtrain, label=ytrain)
lgbmParams = {'boosting_type': 'gbdt',
'num_leaves': 250,
'max_depth': 3,
'min_data_in_leaf': 10,
'min_gain_to_split': 0.75,
'learning_rate': 0.01,
'subsample_for_bin': 120100,
'min_child_samples': 70,
'reg_alpha': 1.45,
'reg_lambda': 2.5,
'feature_fraction': 0.45,
'bagging_fraction': 0.55,
'is_unbalance': True,
'objective': 'multiclass',
'num_class': 4,
'metric': 'multi_logloss',
'verbose': 1}
lgbmModel = lgb.train(lgbmParams, ltrain, valid_sets=ltest,fobj=condObjective, feval=mlogloss, num_boost_round=5000, early_stopping_rounds=100, verbose_eval=50)
假设没有更好的方法来强制我的预测符合我施加的限制条件,我需要做什么才能使自定义 objective 起作用?
这个错误的问题
-> 2380 eval_name, val, is_higher_better = feval_ret // this is the return of mlogloss
2381 ret.append((data_name, eval_name, val, is_higher_better))
2382 return ret
TypeError: 'numpy.float64' object is not iterable
来自函数mlogloss()
。因为您将它用作 eval 函数 feval=mlogloss
它应该 return 3 件事:它的名称、值和一个布尔值,指示值越高越好。
def mlogloss(...):
...
return "my_loss_name", loss_value, False
需要两个函数,用于训练和验证,其中训练自定义损失(lgb.train 参数中的 feval)我们需要“grad,hess”作为 return,而我们需要 grad, hess,Boolean as a return,其中布尔值表示损失值越高越好。
看看这个,不是我的博客:https://maxhalford.github.io/blog/lightgbm-focal-loss/#lightgbm-custom-loss-function-caveats