cross_val_score 无法深度复制自定义估算器

Custom estimator can't be deepcopied by cross_val_score

我有一个自己实现的自定义估算器,但我无法使用 cross_val_score(),我认为这与我的 predict() 方法有关。这是完整的错误跟踪:

    Traceback (most recent call last):
  File "/Users/joann/Desktop/Implementações ML/Adaboost Classifier/test.py", line 30, in <module>
    ada2_score = cross_val_score(ada_2, X, y, cv=5)
  File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 390, in cross_val_score
    error_score=error_score)
  File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 236, in cross_validate
    for train, test in cv.split(X, y, groups))
  File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 1004, in __call__
    if self.dispatch_one_batch(iterator):
  File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 835, in dispatch_one_batch
    self._dispatch(tasks)
  File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 754, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 209, in apply_async
    result = ImmediateResult(func)
  File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 590, in __init__
    self.results = batch()
  File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 256, in __call__
    for func, args, kwargs in self.items]
  File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 256, in <listcomp>
    for func, args, kwargs in self.items]
  File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 544, in _fit_and_score
    test_scores = _score(estimator, X_test, y_test, scorer)
  File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 591, in _score
    scores = scorer(estimator, X_test, y_test)
  File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/sklearn/metrics/_scorer.py", line 89, in __call__
    score = scorer(estimator, *args, **kwargs)
  File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/sklearn/metrics/_scorer.py", line 371, in _passthrough_scorer
    return estimator.score(*args, **kwargs)
  File "/Users/joann/Desktop/Implementações ML/Adaboost Classifier/Adaboost.py", line 92, in score
    scr_pred = self.predict(X)
  File "/Users/joann/Desktop/Implementações ML/Adaboost Classifier/Adaboost.py", line 73, in predict
    clf_pred = clf.predict(X)
  File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/sklearn_extensions/extreme_learning_machines/elm.py", line 614, in predict
    class_predictions = self.binarizer.inverse_transform(raw_predictions)
  File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/sklearn/preprocessing/_label.py", line 528, in inverse_transform
    self.classes_, threshold)
  File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/sklearn/preprocessing/_label.py", line 750, in _inverse_binarize_thresholding
    format(y.shape))
ValueError: output_type='binary', but y.shape = (30, 3)

我的 predict(self, X) 方法 returns 大小为 n_samples 的向量,带有 X 参数的预测。我还做了一个score()函数如下:

def score(self, X, y):
    scr_pred = self.predict(X)
    return sum(scr_pred == y) / X.shape[0]

此方法仅计算给定样本的模型的准确性。如果我使用此 score() 方法或设置 cross_val_score(... , scoring="accuracy") 它不起作用。

注意:我知道但这不适用于我的情况,因为我可以确认我的构造函数的一致性:

def __init__(self, estimators=["MLP"], n_rounds=5, random_state=10):
    self.estimators = estimators
    self.n_rounds = n_rounds
    self.random_state = random_state

更新:

进一步的研究让我找到了 ,其中解释说 sklearn 无法使用转换器深度复制 Estimator。但是,我的估算器必须 运行 LabelBinarizer 转换数据以获得预测。所以我将问题标题更新为正确的问题。`

然而,您的问题陈述在这里并不清楚,但是从错误来看,您似乎正在尝试多类分类。

这里的问题是您的代码在某些时候可能没有正确完成预处理,因为错误是从 inverse_binarize_thresholding 记录的,这是由于 sklearn pre-prosessing 的以下功能引起的:

def _inverse_binarize_thresholding(y, output_type, classes, threshold):
   
    if output_type == "binary" and y.ndim == 2 and y.shape[1] > 2:
        raise ValueError("output_type='binary', but y.shape = {0}".
                         format(y.shape))

您的代码中肯定缺少某些转换或 pre-prosessing,您必须正确使用 LabelBinarizer

阅读以下文档并回溯错误以修复您的代码

documentation