强制 RFECV 保留一些特征

Force RFECV to keep some features

我正在 运行 进行特征选择,我一直在使用 RFECV 来寻找最佳数量的特征。 但是,我想保留某些功能...所以,我想知道是否有任何方法可以强制算法保留这些选定的功能,以及 运行 其余功能的 RFECV。

到目前为止,我运行使用它的所有功能:

def main():

    df_data = pd.read_csv(csv_file_path, index_col=0)
    
    X_train, y_train, X_test, y_test = split_data(df_data)
    feats_selection(X_train, y_train, X_test, y_test)


def feats_selection(X_train, y_train, X_test, y_test):
    nr_splits = 10
    nr_repeats = 1
    features_step = 1
    est = DecisionTreeRegressor()

    cv_mode = RepeatedKFold(n_splits=nr_splits, n_repeats=nr_repeats, random_state=1)
    rfecv = RFECV(estimator=est, step=features_step, cv=cv_mode, scoring='neg_mean_squared_error', verbose=0)

    ## >>> here, the RFECV algorithm is automatically selecting the optimal features <<<
    X_train_transformed = rfecv.fit_transform(X_train, y_train)
    X_test_transformed = rfecv.transform(X_test)


    ## test on test subset
    est.fit(X_train_transformed, y_train)
    y_pred = est.predict(X_test_transformed)
    rmse = mean_squared_error(y_test, y_pred, squared=False)

RFECV没有这个参数,没有。

也许最简洁的方法是使用 ColumnTransformer:

cols_to_always_keep = [...]  # column names if you'll fit on dataframe, column indices otherwise
col_sel = ColumnTransformer(
    transformers=['keep', "passthrough", cols_to_always_keep)],
    remainder=rfecv,
)