安装在管道内时，过采样 (SMOTE) 无法正常工作

Question

我有一个不平衡的分类问题，我正在使用来自 imblearn

的 make_pipeline

所以步骤如下：

kf = StratifiedKFold(n_splits=10, random_state=42, shuffle=True)
params = {
    'max_depth': [2,3,5],
#     'max_features':['auto', 'sqrt', 'log2'],
#     'min_samples_leaf': [5,10,20,50,100,200,300],
    'n_estimators': [10,25,30,50]
#     'bootstrap': [True, False]

}
from imblearn.pipeline import make_pipeline
imba_pipeline = make_pipeline(SMOTE(random_state = 42), RobustScaler(), RandomForestClassifier(random_state=42)) 
imba_pipeline

out:Pipeline(steps=[('smote', SMOTE(random_state=42)),
                ('robustscaler', RobustScaler()),
                ('randomforestclassifier',
                 RandomForestClassifier(random_state=42))])

new_params = {'randomforestclassifier__' + key: params[key] for key in params}
grid_imba = GridSearchCV(imba_pipeline, param_grid=new_params, cv=kf, scoring='recall',
                        return_train_score=True, n_jobs=-1, verbose=2)

grid_imba.fit(X_train, y_train)

一切正常，我正在解决问题（即我可以看到分类报告）

然而，当我试图通过 eli5 和 eli.explain_weights(imba_pipeline)

查看黑框内部时

我返回错误

TypeError: All intermediate steps should be transformers and implement fit and transform or be the string 'passthrough' 'SMOTE(random_state=42)' (type <class 'imblearn.over_sampling._smote.SMOTE'>) doesn't

我知道这是一个常见问题，我已经阅读了相关问题，但我很困惑，因为问题是在我的分类程序结束后才出现的

有什么建议吗？

Answer 1

您的管道有两个安装步骤（+ 缩放器）：SMOTE 增强和随机森林。看起来这让 eli5 感到困惑，它想要使用仅安装最后一层的假设。要获得随机森林的权重解释，您可以尝试仅在具有

的管道层上调用 eli5

from eli5 import explain_weights

explain_weights(imba_pipeline['randomforestclassifier'])

如果管道已安装，但在您的代码中您正在安装网格搜索，所以

explain_weights(grid_imba.best_estimator_['randomforestclassifier'])

会更合适。

Answer 2

只是想指出 SMOTE 通常不会提高预测质量。参见 https://arxiv.org/abs/2201.08528

安装在管道内时，过采样 (SMOTE) 无法正常工作

oversampling (SMOTE) does not work properly when fitted inside a pipeline

classification

machine-learning

python-3.x

oversampling

imbalanced-data