尝试将 imblearn.pipeline 与 RandomOverSampler 和 DecisionTreeClassifier 一起使用
Trying to use imblearn.pipeline with RandomOverSampler and DecisionTreeClassifier
我正在尝试使用 GridSearchCV 设置 DecisionTreeClassifiers 的超参数,并且由于我的数据不平衡,我正在尝试使用 imblearn.over_sampling.RandomOverSampler。
from imblearn.over_sampling import RandomOverSampler
dtpass = tree.DecisionTreeClassifier()
pipe1 = Pipeline([('sampling', RandomOverSampler()), ('class', dtpass)])
parameters = {'class__max_depth': range(3,7),
'class__ccp_alpha': np.arange(0, 0.001, 0.00025),
'class__min_samples_leaf' : [50]
}
dt2 = GridSearchCV(estimator = pipe1,
param_grid = parameters,
n_jobs = 4,
scoring = 'roc_auc'
)
dt2.fit(x, y)
这returns一个错误:
AttributeError: 'RandomOverSampler' object has no attribute '_validate_data'
我做错了什么?
编辑:下面发布了解决方案
试试这个:
from imblearn.over_sampling import RandomOverSampler
from sklearn.tree import DecisionTreeClassifier
from imblearn.pipeline import make_pipeline
from sklearn.model_selection import GridSearchCV
import numpy as np
dtpass = DecisionTreeClassifier()
sampling=RandomOverSampler()
pipe1=make_pipeline(sampling,dtpass)
# pipe1 = Pipeline([('sampling', RandomOverSampler()), ('class', dtpass)])
parameters = {'class__max_depth': range(3,7),
'class__ccp_alpha': np.arange(0, 0.001, 0.00025),
'class__min_samples_leaf' : [50]
}
dt2 = GridSearchCV(estimator = pipe1,
param_grid = parameters,
n_jobs = 4,
scoring = 'roc_auc'
)
dt2.fit(x, y)
Link 到需要 很多 谷歌搜索的解决方案页面:
解决方案是
pip install -U imbalanced-learn
而不是
conda install -c conda-forge imbalanced-learn
我正在尝试使用 GridSearchCV 设置 DecisionTreeClassifiers 的超参数,并且由于我的数据不平衡,我正在尝试使用 imblearn.over_sampling.RandomOverSampler。
from imblearn.over_sampling import RandomOverSampler
dtpass = tree.DecisionTreeClassifier()
pipe1 = Pipeline([('sampling', RandomOverSampler()), ('class', dtpass)])
parameters = {'class__max_depth': range(3,7),
'class__ccp_alpha': np.arange(0, 0.001, 0.00025),
'class__min_samples_leaf' : [50]
}
dt2 = GridSearchCV(estimator = pipe1,
param_grid = parameters,
n_jobs = 4,
scoring = 'roc_auc'
)
dt2.fit(x, y)
这returns一个错误:
AttributeError: 'RandomOverSampler' object has no attribute '_validate_data'
我做错了什么?
编辑:下面发布了解决方案
试试这个:
from imblearn.over_sampling import RandomOverSampler
from sklearn.tree import DecisionTreeClassifier
from imblearn.pipeline import make_pipeline
from sklearn.model_selection import GridSearchCV
import numpy as np
dtpass = DecisionTreeClassifier()
sampling=RandomOverSampler()
pipe1=make_pipeline(sampling,dtpass)
# pipe1 = Pipeline([('sampling', RandomOverSampler()), ('class', dtpass)])
parameters = {'class__max_depth': range(3,7),
'class__ccp_alpha': np.arange(0, 0.001, 0.00025),
'class__min_samples_leaf' : [50]
}
dt2 = GridSearchCV(estimator = pipe1,
param_grid = parameters,
n_jobs = 4,
scoring = 'roc_auc'
)
dt2.fit(x, y)
Link 到需要 很多 谷歌搜索的解决方案页面:
解决方案是
pip install -U imbalanced-learn
而不是
conda install -c conda-forge imbalanced-learn