XGBoost 中 Booster 对象的需求是什么？另外，如何在SkLearn的SelectfromModel中使用呢？

Question

我正在尝试通过提取重要特征然后使用它们来预测值来使用 XGBoost 进行预测。我使用了两种代码，一种带有助推器，一种没有。两种情况下的特征重要性都不同。

xg_reg = xgb.XGBRegressor(objective ='reg:squarederror', colsample_bytree = 0.3, learning_rate = 0.01,max_depth = 6, reg_alpha = 15, n_estimators = 1000, subsample = 0.5)

xg_reg_1 = xgb.train(params=params, dtrain=data_dmatrix, num_boost_round=300)

另外，如果我在 SelectfromModel 中使用了 booster 对象，它会抛出一个 error.Kindly 让我知道要对代码进行的更改。

xgb_fea_imp=pd.DataFrame(list(xg_reg_1.get_fscore().items()),columns=['feature','importance']).sort_values('importance', ascending=False)
threshold1 = xgb_fea_imp.T.to_numpy()

from sklearn.feature_selection import SelectFromModel    
# select the features
selection = SelectFromModel(xg_reg_1, threshold=threshold1[5], prefit=True)    
feature_idx = selection.get_support()
feature_name = X.columns[feature_idx]
   
selected_dataset = selection.transform(X)
selected_dataset = pd.DataFrame(selected_dataset)
selected_dataset.columns = feature_name

错误如下：

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-b089dd085f01> in <module>
      4 selection = SelectFromModel(xg_reg_1, threshold=threshold1[5], prefit=True)
      5 
----> 6 feature_idx = selection.get_support()
      7 feature_name = X.columns[feature_idx]
      8 #print(feature_idx)

~\Anaconda3\lib\site-packages\sklearn\feature_selection\_base.py in get_support(self, indices)
     50             values are indices into the input feature vector.
     51         """
---> 52         mask = self._get_support_mask()
     53         return mask if not indices else np.where(mask)[0]
     54 

~\Anaconda3\lib\site-packages\sklearn\feature_selection\_from_model.py in _get_support_mask(self)
    186                              ' "prefit=True" while passing the fitted'
    187                              ' estimator to the constructor.')
--> 188         scores = _get_feature_importances(
    189             estimator=estimator, getter=self.importance_getter,
    190             transform_func='norm', norm_order=self.norm_order)

~\Anaconda3\lib\site-packages\sklearn\feature_selection\_base.py in _get_feature_importances(estimator, getter, transform_func, norm_order)
    171                 getter = attrgetter('feature_importances_')
    172             else:
--> 173                 raise ValueError(
    174                     f"when `importance_getter=='auto'`, the underlying "
    175                     f"estimator {estimator.__class__.__name__} should have "

ValueError: when `importance_getter=='auto'`, the underlying estimator Booster should have `coef_` or `feature_importances_` attribute. Either pass a fitted estimator to feature selector or call fit before calling transform.

如果我继续前进，并声明 Prefit=False，它会要求在使用前拟合模型。

Answer 1

您不应使用其核心 API 构建 xgboost 回归模型。 xgb returns 的 train 函数是一个 Booster 对象，它没有 coef_ 或 feature_importances_ 属性。使用与 Sklearn 兼容的 xgb.XGBRegressor 并具有可在 SelectFromModel.

中使用的 feature_importances_

XGBoost 中 Booster 对象的需求是什么？另外，如何在SkLearn的SelectfromModel中使用呢？

What is the need of Booster object in XGBoost? Also, how to use it in SelectfromModel of SkLearn?

python

prediction

feature-selection

scikit-learn

xgboost