Xgboost 不是运行校准分类器

Question

我正在尝试运行 XGboost 与校准分类器，下面是我遇到错误的代码片段：

from sklearn.calibration import CalibratedClassifierCV
from xgboost import XGBClassifier
import numpy as np

x_train =np.array([1,2,2,3,4,5,6,3,4,10,]).reshape(-1,1)
y_train = np.array([1,1,1,1,1,3,3,3,3,3])

x_cfl=XGBClassifier(n_estimators=1)
x_cfl.fit(x_train,y_train)
sig_clf = CalibratedClassifierCV(x_cfl, method="sigmoid")
sig_clf.fit(x_train, y_train)

错误：

TypeError: predict_proba() got an unexpected keyword argument 'X'"

完整跟踪：

TypeError                                Traceback (most recent call last)
<ipython-input-48-08dd0b4ae8aa> in <module>
----> 1 sig_clf.fit(x_train, y_train)

~/anaconda3/lib/python3.8/site-packages/sklearn/calibration.py in fit(self, X, y, sample_weight)
    309                 parallel = Parallel(n_jobs=self.n_jobs)
    310 
--> 311                 self.calibrated_classifiers_ = parallel(
    312                     delayed(_fit_classifier_calibrator_pair)(
    313                         clone(base_estimator), X, y, train=train, test=test,

~/anaconda3/lib/python3.8/site-packages/joblib/parallel.py in __call__(self, iterable)
   1039             # remaining jobs.
   1040             self._iterating = False
-> 1041             if self.dispatch_one_batch(iterator):
   1042                 self._iterating = self._original_iterator is not None
   1043 

~/anaconda3/lib/python3.8/site-packages/joblib/parallel.py in dispatch_one_batch(self, iterator)
    857                 return False
    858             else:
--> 859                 self._dispatch(tasks)
    860                 return True
    861 

~/anaconda3/lib/python3.8/site-packages/joblib/parallel.py in _dispatch(self, batch)
    775         with self._lock:
    776             job_idx = len(self._jobs)
--> 777             job = self._backend.apply_async(batch, callback=cb)
    778             # A job can complete so quickly than its callback is
    779             # called before we get here, causing self._jobs to

~/anaconda3/lib/python3.8/site-packages/joblib/_parallel_backends.py in apply_async(self, func, callback)
    206     def apply_async(self, func, callback=None):
    207         """Schedule a func to be run"""
--> 208         result = ImmediateResult(func)
    209         if callback:
    210             callback(result)

~/anaconda3/lib/python3.8/site-packages/joblib/_parallel_backends.py in __init__(self, batch)
    570         # Don't delay the application, to avoid keeping the input
    571         # arguments in memory
--> 572         self.results = batch()
    573 
    574     def get(self):

~/anaconda3/lib/python3.8/site-packages/joblib/parallel.py in __call__(self)
    260         # change the default number of processes to -1
    261         with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 262             return [func(*args, **kwargs)
    263                     for func, args, kwargs in self.items]
    264 

~/anaconda3/lib/python3.8/site-packages/joblib/parallel.py in <listcomp>(.0)
    260         # change the default number of processes to -1
    261         with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 262             return [func(*args, **kwargs)
    263                     for func, args, kwargs in self.items]
    264 

~/anaconda3/lib/python3.8/site-packages/sklearn/utils/fixes.py in __call__(self, *args, **kwargs)
    220     def __call__(self, *args, **kwargs):
    221         with config_context(**self.config):
--> 222             return self.function(*args, **kwargs)

~/anaconda3/lib/python3.8/site-packages/sklearn/calibration.py in _fit_classifier_calibrator_pair(estimator, X, y, train, test, supports_sw, method, classes, sample_weight)
    443     n_classes = len(classes)
    444     pred_method = _get_prediction_method(estimator)
--> 445     predictions = _compute_predictions(pred_method, X[test], n_classes)
    446 
    447     sw = None if sample_weight is None else sample_weight[test]

~/anaconda3/lib/python3.8/site-packages/sklearn/calibration.py in _compute_predictions(pred_method, X, n_classes)
    499         (X.shape[0], 1).
    500     """
--> 501     predictions = pred_method(X=X)
    502     if hasattr(pred_method, '__name__'):
    503         method_name = pred_method.__name__

TypeError: predict_proba() got an unexpected keyword argument 'X'

我对此感到非常惊讶，因为直到昨天它对我来说都是运行ning，当我使用其他分类器时，同样的代码是运行ning。

from sklearn.calibration import CalibratedClassifierCV
from xgboost import XGBClassifier
import numpy as np

x_train = np.array([1,2,2,3,4,5,6,3,4,10,]).reshape(-1,1)
y_train = np.array([1,1,1,1,1,3,3,3,3,3])


x_cfl=LGBMClassifier(n_estimators=1)
x_cfl.fit(x_train,y_train)
sig_clf = CalibratedClassifierCV(x_cfl, method="sigmoid")
sig_clf.fit(x_train, y_train)

输出：

CalibratedClassifierCV(base_estimator=LGBMClassifier(n_estimators=1))

我的 Xgboost 安装有问题吗？？我是用conda安装的，最后记得昨天把xgboost卸载了再安装。

我的 xgboost 版本：

1.3.0

Answer 1

现在已经修复了，似乎 scikit-learn=0.24 中存在错误

我降级到 0.22.2.post1 并且它已修复！

Answer 2

我认为问题出在 XGBoost 上。在这里解释：https://github.com/dmlc/xgboost/pull/6555

XGBoost 定义：

predict_proba(self, data, ...

而不是：

predict_proba(self, X, ...

并且由于 sklearn 0.24 调用 clf.predict_proba(X=X)，抛出异常。

这里有一个在不更改包版本的情况下解决问题的想法：创建一个继承 XGBoostClassifier 的 class 以使用正确的参数名称覆盖 predict_proba 并调用 super().

Answer 3

最好的方法是升级你的 xgboost

运行以下来自jupyter notebook

pip install xgboost --upgrade

Xgboost 不是运行校准分类器

Xgboost not running with Callibrated Classifier

python

machine-learning

xgboost

Xgboost 不是 运行 校准分类器

Xgboost not running with Callibrated Classifier

python

machine-learning

xgboost

Xgboost 不是运行校准分类器