特征重要性装袋分类器和列名

feature importance bagging classifier and column names

我已经提到了这两个帖子:

请不要将其标记为重复。

我正在尝试从装袋分类器(没有内置特征重要性)中获取特征名称。

我有以下示例数据和代码基于上面链接的那些相关帖子

import numpy as np
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True)
clf = BaggingClassifier(DecisionTreeClassifier())
clf.fit(X, y)

feature_importances = np.mean([tree.feature_importances_ for tree in clf.estimators_], axis=0)

但这只输出特征重要性(如下所示),但我还想要特征名称。

feature_importances 
# array([0.15098599, 0.27608213, 0.33606019, 0.23687169])

如何找到这些特征重要性值对应的特征名称?

您可以调用 load_iris 函数而不带任何参数,这样函数的 return 将是一个具有某些属性的 Bunch 对象(dictionary-like 对象) .对于您的用例,最相关的是 bunch.data(特征矩阵)、bunch.targetbunch.feature_names.

...

bunch = load_iris()
X = bunch.data
y = bunch.target
feature_names = bunch.feature_names

clf = BaggingClassifier(DecisionTreeClassifier(), random_state=42)
clf.fit(X, y)

feature_importances = np.mean([tree.feature_importances_ for tree in clf.estimators_], axis=0)

output = {fn:fi for fn,fi in zip(feature_names,feature_importances)}
print(output)
{
    'sepal length (cm)': 0.008652347823679744,
    'sepal width (cm)': 0.01945400672681583,
    'petal length (cm)': 0.539297348817521,
    'petal width (cm)': 0.43259629663198346
}