以百分比打印特征重要性
Print feature importance in percentage
我在Python中拟合了基本的LGBM模型。
# Create an instance
LGBM = LGBMRegressor(random_state = 123, importance_type = 'gain') # `split` can be also selected here
# Fit the model (subset of data)
LGBM.fit(X_train_subset, y_train_subset)
# Predict y_pred
y_pred = LGBM.predict(X_test)
我正在查看文档:
importance_type (string, optional (default="split")) – How the
importance is calculated. If “split”,
result contains numbers of times the feature is used in a model. If “gain”, result contains total
gains of splits which use the feature.
我使用了 gain
它打印了总收益。
# Print features by importantce
pd.DataFrame([X_train.columns, LGBM.feature_importances_]).T.sort_values([1], ascending = [True])
0 1
59 SLG_avg_p 0
4 PA_avg 2995.8
0 home 5198.55
26 next_home 11824.2
67 first_time_pitcher 15042.1
etc
我试过了:
# get importance
importance = LGBM.feature_importances_
# summarize feature importance
for i, v in enumerate(importance):
print('Feature: %0d, Score: %.5f' % (i,v))
# plot feature importance
plt.bar([x for x in range(len(importance))], importance)
plt.show()
并接收值和绘图:
Feature: 0, Score: 5198.55005
Feature: 1, Score: 20688.87198
Feature: 2, Score: 49147.90228
Feature: 3, Score: 71734.03088
etc
我也试过:
# feature importance
print(LGBM.feature_importances_)
# plot
plt.bar(range(len(LGBM.feature_importances_)), LGBM.feature_importances_)
plt.show()
如何在这个模型中打印百分比?出于某种原因,我确信他们会自动计算它。
百分比选项在 R version but not in the Python one 中可用。在 Python 中,您可以执行以下操作(使用虚构的示例,因为我没有您的数据):
from sklearn.datasets import make_regression
import matplotlib.pyplot as plt
from lightgbm import LGBMRegressor
import pandas as pd
X, y = make_regression(n_samples=1000, n_features=10, n_informative=10, random_state=1)
feature_names = [f'Feature {i+1}' for i in range(10)]
X = pd.DataFrame(X, columns=feature_names)
model = LGBMRegressor(importance_type='gain')
model.fit(X, y)
feature_importances = (model.feature_importances_ / sum(model.feature_importances_)) * 100
results = pd.DataFrame({'Features': feature_names,
'Importances': feature_importances})
results.sort_values(by='Importances', inplace=True)
ax = plt.barh(results['Features'], results['Importances'])
plt.xlabel('Importance percentages')
plt.show()
输出:
我在Python中拟合了基本的LGBM模型。
# Create an instance
LGBM = LGBMRegressor(random_state = 123, importance_type = 'gain') # `split` can be also selected here
# Fit the model (subset of data)
LGBM.fit(X_train_subset, y_train_subset)
# Predict y_pred
y_pred = LGBM.predict(X_test)
我正在查看文档:
importance_type (string, optional (default="split")) – How the importance is calculated. If “split”, result contains numbers of times the feature is used in a model. If “gain”, result contains total gains of splits which use the feature.
我使用了 gain
它打印了总收益。
# Print features by importantce
pd.DataFrame([X_train.columns, LGBM.feature_importances_]).T.sort_values([1], ascending = [True])
0 1
59 SLG_avg_p 0
4 PA_avg 2995.8
0 home 5198.55
26 next_home 11824.2
67 first_time_pitcher 15042.1
etc
我试过了:
# get importance
importance = LGBM.feature_importances_
# summarize feature importance
for i, v in enumerate(importance):
print('Feature: %0d, Score: %.5f' % (i,v))
# plot feature importance
plt.bar([x for x in range(len(importance))], importance)
plt.show()
并接收值和绘图:
Feature: 0, Score: 5198.55005
Feature: 1, Score: 20688.87198
Feature: 2, Score: 49147.90228
Feature: 3, Score: 71734.03088
etc
我也试过:
# feature importance
print(LGBM.feature_importances_)
# plot
plt.bar(range(len(LGBM.feature_importances_)), LGBM.feature_importances_)
plt.show()
如何在这个模型中打印百分比?出于某种原因,我确信他们会自动计算它。
百分比选项在 R version but not in the Python one 中可用。在 Python 中,您可以执行以下操作(使用虚构的示例,因为我没有您的数据):
from sklearn.datasets import make_regression
import matplotlib.pyplot as plt
from lightgbm import LGBMRegressor
import pandas as pd
X, y = make_regression(n_samples=1000, n_features=10, n_informative=10, random_state=1)
feature_names = [f'Feature {i+1}' for i in range(10)]
X = pd.DataFrame(X, columns=feature_names)
model = LGBMRegressor(importance_type='gain')
model.fit(X, y)
feature_importances = (model.feature_importances_ / sum(model.feature_importances_)) * 100
results = pd.DataFrame({'Features': feature_names,
'Importances': feature_importances})
results.sort_values(by='Importances', inplace=True)
ax = plt.barh(results['Features'], results['Importances'])
plt.xlabel('Importance percentages')
plt.show()
输出: