如何在特征重要性图中显示原始特征名称?
How to show original feature names in the feature importance plot?
我创建的 XGBoost 模型如下:
y = XY.DELAY_MIN
X = standardized_df
train_X, test_X, train_y, test_y = train_test_split(X.as_matrix(), y.as_matrix(), test_size=0.25)
my_imputer = preprocessing.Imputer()
train_X = my_imputer.fit_transform(train_X)
test_X = my_imputer.transform(test_X)
xgb_model = XGBRegressor()
# Add silent=True to avoid printing out updates with each cycle
xgb_model = XGBRegressor(n_estimators=1000, learning_rate=0.05)
xgb_model.fit(train_X, train_y, early_stopping_rounds=5,
eval_set=[(test_X, test_y)], verbose=False)
当我创建特征重要性图时,特征名称显示为 "f1"、"f2" 等。如何显示原始特征名称?
fig, ax = plt.subplots(figsize=(12,18))
xgb.plot_importance(xgb_model, max_num_features=30, height=0.8, ax=ax)
plt.show()
问题是 Imputer
不会 return pd.DataFrame
作为 transform()
的输出,因此,当您执行
train_X = my_imputer.fit_transform(train_X)
test_X = my_imputer.transform(test_X)
简单的解决方案,将 imputer 输出包装到数据帧中,例如:
train_X = pd.DataFrame(my_imputer.fit_transform(train_X), columns=train_X.columns)
test_X = pd.DataFrame(my_imputer.transform(test_X), columns=test_X.columns)
我创建的 XGBoost 模型如下:
y = XY.DELAY_MIN
X = standardized_df
train_X, test_X, train_y, test_y = train_test_split(X.as_matrix(), y.as_matrix(), test_size=0.25)
my_imputer = preprocessing.Imputer()
train_X = my_imputer.fit_transform(train_X)
test_X = my_imputer.transform(test_X)
xgb_model = XGBRegressor()
# Add silent=True to avoid printing out updates with each cycle
xgb_model = XGBRegressor(n_estimators=1000, learning_rate=0.05)
xgb_model.fit(train_X, train_y, early_stopping_rounds=5,
eval_set=[(test_X, test_y)], verbose=False)
当我创建特征重要性图时,特征名称显示为 "f1"、"f2" 等。如何显示原始特征名称?
fig, ax = plt.subplots(figsize=(12,18))
xgb.plot_importance(xgb_model, max_num_features=30, height=0.8, ax=ax)
plt.show()
问题是 Imputer
不会 return pd.DataFrame
作为 transform()
的输出,因此,当您执行
train_X = my_imputer.fit_transform(train_X)
test_X = my_imputer.transform(test_X)
简单的解决方案,将 imputer 输出包装到数据帧中,例如:
train_X = pd.DataFrame(my_imputer.fit_transform(train_X), columns=train_X.columns)
test_X = pd.DataFrame(my_imputer.transform(test_X), columns=test_X.columns)