将 sklearn 的 BaggingClassifier 与 GridSearchCV 一起使用时出现 ZeroDivisionError
ZeroDivisionError when using sklearn's BaggingClassifier with GridSearchCV
我正在尝试使用套袋改进完美运行的伯努利朴素贝叶斯模型。
但是当我尝试交叉验证 BaggingClassifier
时,我从 parallel.py 得到了一个非常意外的 ZeroDivisionError
。
我已经尝试更改我知道的所有参数,重新启动 python 但没有任何效果。
这是一个可重现的示例,其中包含经过二进制修改的 iris
数据集:
#%% run
import numpy as np
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import BaggingClassifier
from sklearn.naive_bayes import BernoulliNB
from sklearn.datasets import load_iris
data = load_iris()
data.targetbin = (data.target!=0).astype("int")
param_grid2={'max_samples' : np.linspace(0.5,1.0,3),
'base_estimator__alpha':np.linspace(0.1,1,3),
'base_estimator__binarize':[*np.linspace(0.0,1,3)],
'base_estimator__fit_prior':[True,False]}
param_grid2={'max_samples' :[0.7]}
clf = GridSearchCV(
BaggingClassifier(
BernoulliNB(),
n_estimators = 10, max_features = 0.5),
param_grid2,
scoring = "accuracy",
verbose=-1)
clf.fit(data.data, data.targetbin)
这里是我的错误堆栈跟踪:
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1
concurrent workers. Traceback (most recent call last):
File "", line 33, in
clf.fit(data.data, data.targetbin)
File
"C:\Users\Dan\Anaconda3\lib\site-packages\sklearn\model_selection_search.py",
line 722, in fit
self._run_search(evaluate_candidates)
File
"C:\Users\Dan\Anaconda3\lib\site-packages\sklearn\model_selection_search.py",
line 1191, in _run_search
evaluate_candidates(ParameterGrid(self.param_grid))
File
"C:\Users\Dan\Anaconda3\lib\site-packages\sklearn\model_selection_search.py",
line 711, in evaluate_candidates
cv.split(X, y, groups)))
File
"C:\Users\Dan\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py",
line 917, in call
if self.dispatch_one_batch(iterator):
File
"C:\Users\Dan\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py",
line 759, in dispatch_one_batch
self._dispatch(tasks)
File
"C:\Users\Dan\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py",
line 716, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File
"C:\Users\Dan\Anaconda3\lib\site-packages\sklearn\externals\joblib_parallel_backends.py",
line 184, in apply_async
callback(result)
File
"C:\Users\Dan\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py",
line 306, in call
self.parallel.print_progress()
File
"C:\Users\Dan\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py",
line 806, in print_progress
if (is_last_item or cursor % frequency):
ZeroDivisionError: integer division or modulo by zero
我做错了什么?
我尝试调试库,发现 sklearn/externals/joblib/parallel.py
的 self.verbose
是 -1
,但默认情况下它应该至少是 0
。所以我认为这是一个错误。
我正在尝试使用套袋改进完美运行的伯努利朴素贝叶斯模型。
但是当我尝试交叉验证 BaggingClassifier
时,我从 parallel.py 得到了一个非常意外的 ZeroDivisionError
。
我已经尝试更改我知道的所有参数,重新启动 python 但没有任何效果。
这是一个可重现的示例,其中包含经过二进制修改的 iris
数据集:
#%% run
import numpy as np
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import BaggingClassifier
from sklearn.naive_bayes import BernoulliNB
from sklearn.datasets import load_iris
data = load_iris()
data.targetbin = (data.target!=0).astype("int")
param_grid2={'max_samples' : np.linspace(0.5,1.0,3),
'base_estimator__alpha':np.linspace(0.1,1,3),
'base_estimator__binarize':[*np.linspace(0.0,1,3)],
'base_estimator__fit_prior':[True,False]}
param_grid2={'max_samples' :[0.7]}
clf = GridSearchCV(
BaggingClassifier(
BernoulliNB(),
n_estimators = 10, max_features = 0.5),
param_grid2,
scoring = "accuracy",
verbose=-1)
clf.fit(data.data, data.targetbin)
这里是我的错误堆栈跟踪:
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers. Traceback (most recent call last):
File "", line 33, in clf.fit(data.data, data.targetbin)
File "C:\Users\Dan\Anaconda3\lib\site-packages\sklearn\model_selection_search.py", line 722, in fit self._run_search(evaluate_candidates)
File "C:\Users\Dan\Anaconda3\lib\site-packages\sklearn\model_selection_search.py", line 1191, in _run_search evaluate_candidates(ParameterGrid(self.param_grid))
File "C:\Users\Dan\Anaconda3\lib\site-packages\sklearn\model_selection_search.py", line 711, in evaluate_candidates cv.split(X, y, groups)))
File "C:\Users\Dan\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 917, in call if self.dispatch_one_batch(iterator):
File "C:\Users\Dan\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 759, in dispatch_one_batch self._dispatch(tasks)
File "C:\Users\Dan\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 716, in _dispatch job = self._backend.apply_async(batch, callback=cb)
File "C:\Users\Dan\Anaconda3\lib\site-packages\sklearn\externals\joblib_parallel_backends.py", line 184, in apply_async callback(result)
File "C:\Users\Dan\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 306, in call self.parallel.print_progress()
File "C:\Users\Dan\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 806, in print_progress if (is_last_item or cursor % frequency):
ZeroDivisionError: integer division or modulo by zero
我做错了什么?
我尝试调试库,发现 sklearn/externals/joblib/parallel.py
的 self.verbose
是 -1
,但默认情况下它应该至少是 0
。所以我认为这是一个错误。