cross_val_score 无法深度复制自定义估算器
Custom estimator can't be deepcopied by cross_val_score
我有一个自己实现的自定义估算器,但我无法使用 cross_val_score()
,我认为这与我的 predict()
方法有关。这是完整的错误跟踪:
Traceback (most recent call last):
File "/Users/joann/Desktop/Implementações ML/Adaboost Classifier/test.py", line 30, in <module>
ada2_score = cross_val_score(ada_2, X, y, cv=5)
File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 390, in cross_val_score
error_score=error_score)
File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 236, in cross_validate
for train, test in cv.split(X, y, groups))
File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 1004, in __call__
if self.dispatch_one_batch(iterator):
File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 835, in dispatch_one_batch
self._dispatch(tasks)
File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 754, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 209, in apply_async
result = ImmediateResult(func)
File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 590, in __init__
self.results = batch()
File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 256, in __call__
for func, args, kwargs in self.items]
File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 256, in <listcomp>
for func, args, kwargs in self.items]
File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 544, in _fit_and_score
test_scores = _score(estimator, X_test, y_test, scorer)
File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 591, in _score
scores = scorer(estimator, X_test, y_test)
File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/sklearn/metrics/_scorer.py", line 89, in __call__
score = scorer(estimator, *args, **kwargs)
File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/sklearn/metrics/_scorer.py", line 371, in _passthrough_scorer
return estimator.score(*args, **kwargs)
File "/Users/joann/Desktop/Implementações ML/Adaboost Classifier/Adaboost.py", line 92, in score
scr_pred = self.predict(X)
File "/Users/joann/Desktop/Implementações ML/Adaboost Classifier/Adaboost.py", line 73, in predict
clf_pred = clf.predict(X)
File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/sklearn_extensions/extreme_learning_machines/elm.py", line 614, in predict
class_predictions = self.binarizer.inverse_transform(raw_predictions)
File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/sklearn/preprocessing/_label.py", line 528, in inverse_transform
self.classes_, threshold)
File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/sklearn/preprocessing/_label.py", line 750, in _inverse_binarize_thresholding
format(y.shape))
ValueError: output_type='binary', but y.shape = (30, 3)
我的 predict(self, X)
方法 returns 大小为 n_samples
的向量,带有 X
参数的预测。我还做了一个score()
函数如下:
def score(self, X, y):
scr_pred = self.predict(X)
return sum(scr_pred == y) / X.shape[0]
此方法仅计算给定样本的模型的准确性。如果我使用此 score()
方法或设置 cross_val_score(... , scoring="accuracy")
它不起作用。
注意:我知道但这不适用于我的情况,因为我可以确认我的构造函数的一致性:
def __init__(self, estimators=["MLP"], n_rounds=5, random_state=10):
self.estimators = estimators
self.n_rounds = n_rounds
self.random_state = random_state
更新:
进一步的研究让我找到了 ,其中解释说 sklearn
无法使用转换器深度复制 Estimator。但是,我的估算器必须 运行 LabelBinarizer
转换数据以获得预测。所以我将问题标题更新为正确的问题。`
然而,您的问题陈述在这里并不清楚,但是从错误来看,您似乎正在尝试多类分类。
这里的问题是您的代码在某些时候可能没有正确完成预处理,因为错误是从 inverse_binarize_thresholding 记录的,这是由于 sklearn pre-prosessing 的以下功能引起的:
def _inverse_binarize_thresholding(y, output_type, classes, threshold):
if output_type == "binary" and y.ndim == 2 and y.shape[1] > 2:
raise ValueError("output_type='binary', but y.shape = {0}".
format(y.shape))
您的代码中肯定缺少某些转换或 pre-prosessing,您必须正确使用 LabelBinarizer
阅读以下文档并回溯错误以修复您的代码
我有一个自己实现的自定义估算器,但我无法使用 cross_val_score()
,我认为这与我的 predict()
方法有关。这是完整的错误跟踪:
Traceback (most recent call last):
File "/Users/joann/Desktop/Implementações ML/Adaboost Classifier/test.py", line 30, in <module>
ada2_score = cross_val_score(ada_2, X, y, cv=5)
File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 390, in cross_val_score
error_score=error_score)
File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 236, in cross_validate
for train, test in cv.split(X, y, groups))
File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 1004, in __call__
if self.dispatch_one_batch(iterator):
File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 835, in dispatch_one_batch
self._dispatch(tasks)
File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 754, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 209, in apply_async
result = ImmediateResult(func)
File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 590, in __init__
self.results = batch()
File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 256, in __call__
for func, args, kwargs in self.items]
File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 256, in <listcomp>
for func, args, kwargs in self.items]
File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 544, in _fit_and_score
test_scores = _score(estimator, X_test, y_test, scorer)
File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 591, in _score
scores = scorer(estimator, X_test, y_test)
File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/sklearn/metrics/_scorer.py", line 89, in __call__
score = scorer(estimator, *args, **kwargs)
File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/sklearn/metrics/_scorer.py", line 371, in _passthrough_scorer
return estimator.score(*args, **kwargs)
File "/Users/joann/Desktop/Implementações ML/Adaboost Classifier/Adaboost.py", line 92, in score
scr_pred = self.predict(X)
File "/Users/joann/Desktop/Implementações ML/Adaboost Classifier/Adaboost.py", line 73, in predict
clf_pred = clf.predict(X)
File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/sklearn_extensions/extreme_learning_machines/elm.py", line 614, in predict
class_predictions = self.binarizer.inverse_transform(raw_predictions)
File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/sklearn/preprocessing/_label.py", line 528, in inverse_transform
self.classes_, threshold)
File "/Users/joann/opt/anaconda3/lib/python3.7/site-packages/sklearn/preprocessing/_label.py", line 750, in _inverse_binarize_thresholding
format(y.shape))
ValueError: output_type='binary', but y.shape = (30, 3)
我的 predict(self, X)
方法 returns 大小为 n_samples
的向量,带有 X
参数的预测。我还做了一个score()
函数如下:
def score(self, X, y):
scr_pred = self.predict(X)
return sum(scr_pred == y) / X.shape[0]
此方法仅计算给定样本的模型的准确性。如果我使用此 score()
方法或设置 cross_val_score(... , scoring="accuracy")
它不起作用。
注意:我知道
def __init__(self, estimators=["MLP"], n_rounds=5, random_state=10):
self.estimators = estimators
self.n_rounds = n_rounds
self.random_state = random_state
更新:
进一步的研究让我找到了 sklearn
无法使用转换器深度复制 Estimator。但是,我的估算器必须 运行 LabelBinarizer
转换数据以获得预测。所以我将问题标题更新为正确的问题。`
然而,您的问题陈述在这里并不清楚,但是从错误来看,您似乎正在尝试多类分类。
这里的问题是您的代码在某些时候可能没有正确完成预处理,因为错误是从 inverse_binarize_thresholding 记录的,这是由于 sklearn pre-prosessing 的以下功能引起的:
def _inverse_binarize_thresholding(y, output_type, classes, threshold):
if output_type == "binary" and y.ndim == 2 and y.shape[1] > 2:
raise ValueError("output_type='binary', but y.shape = {0}".
format(y.shape))
您的代码中肯定缺少某些转换或 pre-prosessing,您必须正确使用 LabelBinarizer
阅读以下文档并回溯错误以修复您的代码