XGBoost plot_importance 无法显示特征名称

Question

我使用 plot_importance 来显示重要性变量。但是有些变量是分类变量，所以我做了一些转换。在我转换变量类型后，当我绘制重要特征时，该图不显示特征名称。我附上了我的代码和情节。数据集 = data.values X = 数据集[1:100,0:-2]

predictors=dataset[1:100,-1]

X = X.astype(str)
encoded_x = None
for i in range(0, X.shape[1]):
    label_encoder = LabelEncoder()
    feature = label_encoder.fit_transform(X[:,i])
    feature = feature.reshape(X.shape[0], 1)
    onehot_encoder = OneHotEncoder(sparse=False)
    feature = onehot_encoder.fit_transform(feature)
    if encoded_x is None:
        encoded_x = feature
    else:
        encoded_x = np.concatenate((encoded_x, feature), axis=1)
print("X shape: : ", encoded_x.shape)


response='Default'
#predictors=list(data.columns.values[:-1])



# Randomly split indexes
X_train, X_test, y_train, y_test = train_test_split(encoded_x,predictors,train_size=0.7, random_state=5)



model = XGBClassifier()
model.fit(X_train, y_train)


plot_importance(model)
plt.show()

[enter image description here][1]


  [1]: https://i.stack.imgur.com/M9qgY.png

Answer 1

这是预期的行为 - sklearn.OneHotEncoder.transform() returns 一个 numpy 二维数组而不是输入 pd.DataFrame（我假设这是你的 dataset 的类型）。所以这不是一个错误，而是一个功能。看起来没有办法在 sklearn API 中手动传递特征名称（可以在本机训练 API 的 xgb.Dmatrix 创建中设置这些名称）。

但是，使用 pd.get_dummies() 而不是您实施的 LabelEncoder + OneHotEncoder 组合可以轻松解决您的问题。我不知道你为什么选择使用它（它可能很有用，如果你还需要处理测试集但你需要玩额外的技巧），但我建议支持 pd.get_dummies()

XGBoost plot_importance 无法显示特征名称

XGBoost plot_importance cannot show feature names

machine-learning

xib

xgboost