尝试将 imblearn.pipeline 与 RandomOverSampler 和 DecisionTreeClassifier 一起使用

Trying to use imblearn.pipeline with RandomOverSampler and DecisionTreeClassifier

我正在尝试使用 GridSearchCV 设置 DecisionTreeClassifiers 的超参数,并且由于我的数据不平衡,我正在尝试使用 imblearn.over_sampling.RandomOverSampler。

from imblearn.over_sampling import RandomOverSampler

dtpass = tree.DecisionTreeClassifier()
pipe1 = Pipeline([('sampling', RandomOverSampler()), ('class', dtpass)])

parameters = {'class__max_depth': range(3,7), 
          'class__ccp_alpha': np.arange(0, 0.001, 0.00025), 
          'class__min_samples_leaf' : [50]
         }

dt2 = GridSearchCV(estimator = pipe1, 
               param_grid = parameters,
               n_jobs = 4,
              scoring = 'roc_auc'
)

dt2.fit(x, y)

这returns一个错误:

AttributeError: 'RandomOverSampler' object has no attribute '_validate_data'

我做错了什么?

编辑:下面发布了解决方案

试试这个:

from imblearn.over_sampling import RandomOverSampler
from sklearn.tree import DecisionTreeClassifier
from imblearn.pipeline import make_pipeline
from sklearn.model_selection import GridSearchCV
import numpy as np

dtpass = DecisionTreeClassifier()
sampling=RandomOverSampler()


pipe1=make_pipeline(sampling,dtpass)
# pipe1 = Pipeline([('sampling', RandomOverSampler()), ('class', dtpass)])

parameters = {'class__max_depth': range(3,7), 
          'class__ccp_alpha': np.arange(0, 0.001, 0.00025), 
          'class__min_samples_leaf' : [50]
         }

dt2 = GridSearchCV(estimator = pipe1, 
               param_grid = parameters,
               n_jobs = 4,
              scoring = 'roc_auc'
)

dt2.fit(x, y)

Link 到需要 很多 谷歌搜索的解决方案页面:

https://makerspace.aisingapore.org/community/ai4i-5-supervised-learning/encountered-attributeerror-when-run-train_test_splitpreprocessed_data-output_var-after-randomoversampler/

解决方案是

 pip install -U imbalanced-learn

而不是

 conda install -c conda-forge imbalanced-learn