强制 RFECV 保留一些特征
Force RFECV to keep some features
我正在 运行 进行特征选择,我一直在使用 RFECV 来寻找最佳数量的特征。
但是,我想保留某些功能...所以,我想知道是否有任何方法可以强制算法保留这些选定的功能,以及 运行 其余功能的 RFECV。
到目前为止,我运行使用它的所有功能:
def main():
df_data = pd.read_csv(csv_file_path, index_col=0)
X_train, y_train, X_test, y_test = split_data(df_data)
feats_selection(X_train, y_train, X_test, y_test)
def feats_selection(X_train, y_train, X_test, y_test):
nr_splits = 10
nr_repeats = 1
features_step = 1
est = DecisionTreeRegressor()
cv_mode = RepeatedKFold(n_splits=nr_splits, n_repeats=nr_repeats, random_state=1)
rfecv = RFECV(estimator=est, step=features_step, cv=cv_mode, scoring='neg_mean_squared_error', verbose=0)
## >>> here, the RFECV algorithm is automatically selecting the optimal features <<<
X_train_transformed = rfecv.fit_transform(X_train, y_train)
X_test_transformed = rfecv.transform(X_test)
## test on test subset
est.fit(X_train_transformed, y_train)
y_pred = est.predict(X_test_transformed)
rmse = mean_squared_error(y_test, y_pred, squared=False)
RFECV
没有这个参数,没有。
也许最简洁的方法是使用 ColumnTransformer
:
cols_to_always_keep = [...] # column names if you'll fit on dataframe, column indices otherwise
col_sel = ColumnTransformer(
transformers=['keep', "passthrough", cols_to_always_keep)],
remainder=rfecv,
)
我正在 运行 进行特征选择,我一直在使用 RFECV 来寻找最佳数量的特征。 但是,我想保留某些功能...所以,我想知道是否有任何方法可以强制算法保留这些选定的功能,以及 运行 其余功能的 RFECV。
到目前为止,我运行使用它的所有功能:
def main():
df_data = pd.read_csv(csv_file_path, index_col=0)
X_train, y_train, X_test, y_test = split_data(df_data)
feats_selection(X_train, y_train, X_test, y_test)
def feats_selection(X_train, y_train, X_test, y_test):
nr_splits = 10
nr_repeats = 1
features_step = 1
est = DecisionTreeRegressor()
cv_mode = RepeatedKFold(n_splits=nr_splits, n_repeats=nr_repeats, random_state=1)
rfecv = RFECV(estimator=est, step=features_step, cv=cv_mode, scoring='neg_mean_squared_error', verbose=0)
## >>> here, the RFECV algorithm is automatically selecting the optimal features <<<
X_train_transformed = rfecv.fit_transform(X_train, y_train)
X_test_transformed = rfecv.transform(X_test)
## test on test subset
est.fit(X_train_transformed, y_train)
y_pred = est.predict(X_test_transformed)
rmse = mean_squared_error(y_test, y_pred, squared=False)
RFECV
没有这个参数,没有。
也许最简洁的方法是使用 ColumnTransformer
:
cols_to_always_keep = [...] # column names if you'll fit on dataframe, column indices otherwise
col_sel = ColumnTransformer(
transformers=['keep', "passthrough", cols_to_always_keep)],
remainder=rfecv,
)