在 xgb 中使用 f-score
Using f-score in xgb
我正在尝试使用 scikit-learn 的 f-score 作为 xgb 分类器中的评估指标。这是我的代码:
clf = xgb.XGBClassifier(max_depth=8, learning_rate=0.004,
n_estimators=100,
silent=False, objective='binary:logistic',
nthread=-1, gamma=0,
min_child_weight=1, max_delta_step=0, subsample=0.8,
colsample_bytree=0.6,
base_score=0.5,
seed=0, missing=None)
scores = []
predictions = []
for train, test, ans_train, y_test in zip(trains, tests, ans_trains, ans_tests):
clf.fit(train, ans_train, eval_metric=xgb_f1,
eval_set=[(train, ans_train), (test, y_test)],
early_stopping_rounds=900)
y_pred = clf.predict(test)
predictions.append(y_pred)
scores.append(f1_score(y_test, y_pred))
def xgb_f1(y, t):
t = t.get_label()
return "f1", f1_score(t, y)
但是出现错误:Can't handle mix of binary and continuous
问题是 f1_score
正在尝试比较 非二进制与二进制 目标,默认情况下此方法进行二进制平均。来自 documentation "average : string, [None, 'binary' (default), 'micro', 'macro', 'samples', 'weighted' ]”。
无论如何,错误是说你的预测是这样连续的 [0.001, 0.7889,0.33...]
但你的目标是二元的 [0,1,0...]
。因此,如果您知道阈值,我建议您在将结果发送到 f1_score
函数之前对其进行预处理。阈值的通常值为 0.5
.
评估函数的测试示例。不再输出错误:
def xgb_f1(y, t, threshold=0.5):
t = t.get_label()
y_bin = [1. if y_cont > threshold else 0. for y_cont in y] # binarizing your output
return 'f1',f1_score(t,y_bin)
根据@smci 的建议,less_verbose/more_efficient 解决方案可能是:
def xgb_f1(y, t, threshold=0.5):
t = t.get_label()
y_bin = (y > threshold).astype(int) # works for both type(y) == <class 'numpy.ndarray'> and type(y) == <class 'pandas.core.series.Series'>
return 'f1',f1_score(t,y_bin)
我正在尝试使用 scikit-learn 的 f-score 作为 xgb 分类器中的评估指标。这是我的代码:
clf = xgb.XGBClassifier(max_depth=8, learning_rate=0.004,
n_estimators=100,
silent=False, objective='binary:logistic',
nthread=-1, gamma=0,
min_child_weight=1, max_delta_step=0, subsample=0.8,
colsample_bytree=0.6,
base_score=0.5,
seed=0, missing=None)
scores = []
predictions = []
for train, test, ans_train, y_test in zip(trains, tests, ans_trains, ans_tests):
clf.fit(train, ans_train, eval_metric=xgb_f1,
eval_set=[(train, ans_train), (test, y_test)],
early_stopping_rounds=900)
y_pred = clf.predict(test)
predictions.append(y_pred)
scores.append(f1_score(y_test, y_pred))
def xgb_f1(y, t):
t = t.get_label()
return "f1", f1_score(t, y)
但是出现错误:Can't handle mix of binary and continuous
问题是 f1_score
正在尝试比较 非二进制与二进制 目标,默认情况下此方法进行二进制平均。来自 documentation "average : string, [None, 'binary' (default), 'micro', 'macro', 'samples', 'weighted' ]”。
无论如何,错误是说你的预测是这样连续的 [0.001, 0.7889,0.33...]
但你的目标是二元的 [0,1,0...]
。因此,如果您知道阈值,我建议您在将结果发送到 f1_score
函数之前对其进行预处理。阈值的通常值为 0.5
.
评估函数的测试示例。不再输出错误:
def xgb_f1(y, t, threshold=0.5):
t = t.get_label()
y_bin = [1. if y_cont > threshold else 0. for y_cont in y] # binarizing your output
return 'f1',f1_score(t,y_bin)
根据@smci 的建议,less_verbose/more_efficient 解决方案可能是:
def xgb_f1(y, t, threshold=0.5):
t = t.get_label()
y_bin = (y > threshold).astype(int) # works for both type(y) == <class 'numpy.ndarray'> and type(y) == <class 'pandas.core.series.Series'>
return 'f1',f1_score(t,y_bin)