Keyerror : weight. Implement XGBoost only on features selected by feature_importance
Keyerror : weight. Implement XGBoost only on features selected by feature_importance
使用 XGBoost 特征重要性,我得到了数据框的特征重要性 X_train。我的 X_train 最初有 49 个特征。 XGBoost feature impotance 告诉我在这 49 个特征中,每个特征的重要性得分是多少。现在我想知道在我的机器学习模型中使用了多少特征。各种阈值如每个特征对应的阈值数组中所述。我想知道我应该采取什么最低门槛来包含功能。我应该包括所有高于 0.3 或 0.4 分数的特征吗?但是我收到一个错误:
from numpy import sort
from sklearn.feature_selection import SelectFromModel
xgb_model = xgb.XGBClassifier(max_depth=5, learning_rate=0.08, n_jobs=-1).fit(X_train, y_train)
thresholds = sort(xgb_model.feature_importances_)
所有功能的阈值如下:
[IN]thresholds
[OUT] array([0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0.00201289, 0.00362736, 0.0036676 , 0.00467797, 0.00532952,
0.00591741, 0.00630169, 0.00661084, 0.00737418, 0.00741502,
0.00748773, 0.00753344, 0.00773079, 0.00852909, 0.00859741,
0.00906814, 0.00929257, 0.00980796, 0.00986394, 0.01056027,
0.01154695, 0.01190695, 0.01203871, 0.01258377, 0.01301482,
0.01383268, 0.01390096, 0.02001457, 0.02699436, 0.03168892,
0.03543754, 0.03578222, 0.13946259, 0.48038903], dtype=float32)
函数 select 仅包含最重要的特征,并创建包含相同特征的数据框 select_X_train。
for thresh in thresholds:
# select features using threshold
selection = SelectFromModel(xgb_model, threshold=thresh, prefit=True)
select_X_train = selection.transform(X_train)
# train model
selection_model = XGBClassifier()
selection_model.fit(select_X_train, y_train)
# eval model
select_X_test = selection.transform(X_test)
y_pred = selection_model.predict(select_X_test)
predictions = [round(value) for value in y_pred]
accuracy = accuracy_score(y_test, predictions)
print("Thresh=%.3f, n=%d, Accuracy: %.2f%%" % (thresh, select_X_train.shape[1], accuracy*100.0))
我遇到以下错误:
----> 4 select_X_train = selection.transform(X_train)
KeyError: 'weight'
没有按名称权重排列的列。如何解决此错误。
预期输出
Thresh=0.00201289, n=33, Accuracy: 77.95%
#33 features with threshold above 0.002
Thresh=0.00362736, n=34, Accuracy: 76.38%
#34 features with threshold above 0.003
Thresh=0.0036676 , n=35, Accuracy: 77.56%
#35 features with threshold above 0.003 and so on
所以基本上采用每个阈值和 运行 XGBoost 并根据指定的最小阈值分数计算所有特征的准确度。
例如,在第一种情况下,XGBoost 将考虑得分至少为 0.00201289 的所有特征,并计算准确性。将考虑阈值至少为 0.003 及以上的下一个特征,依此类推。
我正在学习类似的教程,并且通过降级到 xgboost==0.90.
成功地执行了基于阈值的特征选择
此外,为避免令人讨厌的警告,请使用 XGClassifier(objective ='reg:squarederror')
使用 XGBoost 特征重要性,我得到了数据框的特征重要性 X_train。我的 X_train 最初有 49 个特征。 XGBoost feature impotance 告诉我在这 49 个特征中,每个特征的重要性得分是多少。现在我想知道在我的机器学习模型中使用了多少特征。各种阈值如每个特征对应的阈值数组中所述。我想知道我应该采取什么最低门槛来包含功能。我应该包括所有高于 0.3 或 0.4 分数的特征吗?但是我收到一个错误:
from numpy import sort
from sklearn.feature_selection import SelectFromModel
xgb_model = xgb.XGBClassifier(max_depth=5, learning_rate=0.08, n_jobs=-1).fit(X_train, y_train)
thresholds = sort(xgb_model.feature_importances_)
所有功能的阈值如下:
[IN]thresholds
[OUT] array([0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0.00201289, 0.00362736, 0.0036676 , 0.00467797, 0.00532952,
0.00591741, 0.00630169, 0.00661084, 0.00737418, 0.00741502,
0.00748773, 0.00753344, 0.00773079, 0.00852909, 0.00859741,
0.00906814, 0.00929257, 0.00980796, 0.00986394, 0.01056027,
0.01154695, 0.01190695, 0.01203871, 0.01258377, 0.01301482,
0.01383268, 0.01390096, 0.02001457, 0.02699436, 0.03168892,
0.03543754, 0.03578222, 0.13946259, 0.48038903], dtype=float32)
函数 select 仅包含最重要的特征,并创建包含相同特征的数据框 select_X_train。
for thresh in thresholds:
# select features using threshold
selection = SelectFromModel(xgb_model, threshold=thresh, prefit=True)
select_X_train = selection.transform(X_train)
# train model
selection_model = XGBClassifier()
selection_model.fit(select_X_train, y_train)
# eval model
select_X_test = selection.transform(X_test)
y_pred = selection_model.predict(select_X_test)
predictions = [round(value) for value in y_pred]
accuracy = accuracy_score(y_test, predictions)
print("Thresh=%.3f, n=%d, Accuracy: %.2f%%" % (thresh, select_X_train.shape[1], accuracy*100.0))
我遇到以下错误:
----> 4 select_X_train = selection.transform(X_train)
KeyError: 'weight'
没有按名称权重排列的列。如何解决此错误。
预期输出
Thresh=0.00201289, n=33, Accuracy: 77.95%
#33 features with threshold above 0.002
Thresh=0.00362736, n=34, Accuracy: 76.38%
#34 features with threshold above 0.003
Thresh=0.0036676 , n=35, Accuracy: 77.56%
#35 features with threshold above 0.003 and so on
所以基本上采用每个阈值和 运行 XGBoost 并根据指定的最小阈值分数计算所有特征的准确度。 例如,在第一种情况下,XGBoost 将考虑得分至少为 0.00201289 的所有特征,并计算准确性。将考虑阈值至少为 0.003 及以上的下一个特征,依此类推。
我正在学习类似的教程,并且通过降级到 xgboost==0.90.
成功地执行了基于阈值的特征选择此外,为避免令人讨厌的警告,请使用 XGClassifier(objective ='reg:squarederror')