XGBoost 中 Booster 对象的需求是什么?另外,如何在SkLearn的SelectfromModel中使用呢?
What is the need of Booster object in XGBoost? Also, how to use it in SelectfromModel of SkLearn?
我正在尝试通过提取重要特征然后使用它们来预测值来使用 XGBoost 进行预测。我使用了两种代码,一种带有助推器,一种没有。两种情况下的特征重要性都不同。
xg_reg = xgb.XGBRegressor(objective ='reg:squarederror', colsample_bytree = 0.3, learning_rate = 0.01,max_depth = 6, reg_alpha = 15, n_estimators = 1000, subsample = 0.5)
xg_reg_1 = xgb.train(params=params, dtrain=data_dmatrix, num_boost_round=300)
另外,如果我在 SelectfromModel 中使用了 booster 对象,它会抛出一个 error.Kindly 让我知道要对代码进行的更改。
xgb_fea_imp=pd.DataFrame(list(xg_reg_1.get_fscore().items()),columns=['feature','importance']).sort_values('importance', ascending=False)
threshold1 = xgb_fea_imp.T.to_numpy()
from sklearn.feature_selection import SelectFromModel
# select the features
selection = SelectFromModel(xg_reg_1, threshold=threshold1[5], prefit=True)
feature_idx = selection.get_support()
feature_name = X.columns[feature_idx]
selected_dataset = selection.transform(X)
selected_dataset = pd.DataFrame(selected_dataset)
selected_dataset.columns = feature_name
错误如下:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-12-b089dd085f01> in <module>
4 selection = SelectFromModel(xg_reg_1, threshold=threshold1[5], prefit=True)
5
----> 6 feature_idx = selection.get_support()
7 feature_name = X.columns[feature_idx]
8 #print(feature_idx)
~\Anaconda3\lib\site-packages\sklearn\feature_selection\_base.py in get_support(self, indices)
50 values are indices into the input feature vector.
51 """
---> 52 mask = self._get_support_mask()
53 return mask if not indices else np.where(mask)[0]
54
~\Anaconda3\lib\site-packages\sklearn\feature_selection\_from_model.py in _get_support_mask(self)
186 ' "prefit=True" while passing the fitted'
187 ' estimator to the constructor.')
--> 188 scores = _get_feature_importances(
189 estimator=estimator, getter=self.importance_getter,
190 transform_func='norm', norm_order=self.norm_order)
~\Anaconda3\lib\site-packages\sklearn\feature_selection\_base.py in _get_feature_importances(estimator, getter, transform_func, norm_order)
171 getter = attrgetter('feature_importances_')
172 else:
--> 173 raise ValueError(
174 f"when `importance_getter=='auto'`, the underlying "
175 f"estimator {estimator.__class__.__name__} should have "
ValueError: when `importance_getter=='auto'`, the underlying estimator Booster should have `coef_` or `feature_importances_` attribute. Either pass a fitted estimator to feature selector or call fit before calling transform.
如果我继续前进,并声明 Prefit=False,它会要求在使用前拟合模型。
您不应使用其核心 API 构建 xgboost
回归模型。 xgb
returns 的 train
函数是一个 Booster
对象,它没有 coef_
或 feature_importances_
属性。使用与 Sklearn 兼容的 xgb.XGBRegressor
并具有可在 SelectFromModel
.
中使用的 feature_importances_
我正在尝试通过提取重要特征然后使用它们来预测值来使用 XGBoost 进行预测。我使用了两种代码,一种带有助推器,一种没有。两种情况下的特征重要性都不同。
xg_reg = xgb.XGBRegressor(objective ='reg:squarederror', colsample_bytree = 0.3, learning_rate = 0.01,max_depth = 6, reg_alpha = 15, n_estimators = 1000, subsample = 0.5)
xg_reg_1 = xgb.train(params=params, dtrain=data_dmatrix, num_boost_round=300)
另外,如果我在 SelectfromModel 中使用了 booster 对象,它会抛出一个 error.Kindly 让我知道要对代码进行的更改。
xgb_fea_imp=pd.DataFrame(list(xg_reg_1.get_fscore().items()),columns=['feature','importance']).sort_values('importance', ascending=False)
threshold1 = xgb_fea_imp.T.to_numpy()
from sklearn.feature_selection import SelectFromModel
# select the features
selection = SelectFromModel(xg_reg_1, threshold=threshold1[5], prefit=True)
feature_idx = selection.get_support()
feature_name = X.columns[feature_idx]
selected_dataset = selection.transform(X)
selected_dataset = pd.DataFrame(selected_dataset)
selected_dataset.columns = feature_name
错误如下:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-12-b089dd085f01> in <module>
4 selection = SelectFromModel(xg_reg_1, threshold=threshold1[5], prefit=True)
5
----> 6 feature_idx = selection.get_support()
7 feature_name = X.columns[feature_idx]
8 #print(feature_idx)
~\Anaconda3\lib\site-packages\sklearn\feature_selection\_base.py in get_support(self, indices)
50 values are indices into the input feature vector.
51 """
---> 52 mask = self._get_support_mask()
53 return mask if not indices else np.where(mask)[0]
54
~\Anaconda3\lib\site-packages\sklearn\feature_selection\_from_model.py in _get_support_mask(self)
186 ' "prefit=True" while passing the fitted'
187 ' estimator to the constructor.')
--> 188 scores = _get_feature_importances(
189 estimator=estimator, getter=self.importance_getter,
190 transform_func='norm', norm_order=self.norm_order)
~\Anaconda3\lib\site-packages\sklearn\feature_selection\_base.py in _get_feature_importances(estimator, getter, transform_func, norm_order)
171 getter = attrgetter('feature_importances_')
172 else:
--> 173 raise ValueError(
174 f"when `importance_getter=='auto'`, the underlying "
175 f"estimator {estimator.__class__.__name__} should have "
ValueError: when `importance_getter=='auto'`, the underlying estimator Booster should have `coef_` or `feature_importances_` attribute. Either pass a fitted estimator to feature selector or call fit before calling transform.
如果我继续前进,并声明 Prefit=False,它会要求在使用前拟合模型。
您不应使用其核心 API 构建 xgboost
回归模型。 xgb
returns 的 train
函数是一个 Booster
对象,它没有 coef_
或 feature_importances_
属性。使用与 Sklearn 兼容的 xgb.XGBRegressor
并具有可在 SelectFromModel
.
feature_importances_