在 SVM 中显示特征重要性的 Python 代码是什么？

Question

如何显示对 SVM 模型有贡献的重要特征以及特征名称？

我的代码如下所示，

首先我导入了模块

from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.grid_search import GridSearchCV
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report

然后我将数据分为特征和变量

y = df_new[['numeric values']]
X = df_new.drop('numeric values', axis=1).values

然后我设置管道

steps = [('scalar', StandardScaler()),
         ('SVM', SVC(kernel='linear'))]

pipeline = Pipeline(steps)

然后我指定了我的超参数space

parameters = {'SVM__C':[1, 10, 100],
              'SVM__gamma':[0.1, 0.01]}

我创建了一个训练集和测试集

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.2, random_state=21)

实例化 GridSearchCV 对象：cv

cv = GridSearchCV(pipeline,param_grid = parameters,cv=5)

适合训练集

cv.fit(X_train,y_train.values.ravel())

预测测试集的标签：y_pred

y_pred = cv.predict(X_test)

feature_importances = cv.best_estimator_.feature_importances_

我收到的错误信息

'Pipeline' object has no attribute 'feature_importances_'

Answer 1

我的理解是，假设您正在构建一个具有 100 个特征的模型，并且您想知道在这种情况下哪个特征更重要，哪个更不重要？

只需尝试单变量特征选择方法，这是非常基本的方法，您可以先试用一下，然后再为您的数据改进方法。示例代码由 scikit-learn 自行提供。您可以根据您的要求对其进行修改。

print(__doc__)

import numpy as np
import matplotlib.pyplot as plt

from sklearn import datasets, svm
from sklearn.feature_selection import SelectPercentile, f_classif

###############################################################################
# import some data to play with

# The iris dataset
iris = datasets.load_iris()

# Some noisy data not correlated
E = np.random.uniform(0, 0.1, size=(len(iris.data), 20))

# Add the noisy data to the informative features
X = np.hstack((iris.data, E))
y = iris.target

###############################################################################
plt.figure(1)
plt.clf()

X_indices = np.arange(X.shape[-1])

###############################################################################
# Univariate feature selection with F-test for feature scoring
# We use the default selection function: the 10% most significant features
selector = SelectPercentile(f_classif, percentile=10)
selector.fit(X, y)
scores = -np.log10(selector.pvalues_)
scores /= scores.max()
plt.bar(X_indices - .45, scores, width=.2,
        label=r'Univariate score ($-Log(p_{value})$)', color='g')

###############################################################################
# Compare to the weights of an SVM
clf = svm.SVC(kernel='linear')
clf.fit(X, y)

svm_weights = (clf.coef_ ** 2).sum(axis=0)
svm_weights /= svm_weights.max()

plt.bar(X_indices - .25, svm_weights, width=.2, label='SVM weight', color='r')

clf_selected = svm.SVC(kernel='linear')
clf_selected.fit(selector.transform(X), y)

svm_weights_selected = (clf_selected.coef_ ** 2).sum(axis=0)
svm_weights_selected /= svm_weights_selected.max()

plt.bar(X_indices[selector.get_support()] - .05, svm_weights_selected,
        width=.2, label='SVM weights after selection', color='b')


plt.title("Comparing feature selection")
plt.xlabel('Feature number')
plt.yticks(())
plt.axis('tight')
plt.legend(loc='upper right')
plt.show()

代码参考。 http://scikit-learn.org/0.15/auto_examples/plot_feature_selection.html

备注; 对于每个特征，此方法将绘制单变量特征选择的 p 值和 SVM 的相应权重。此方法选择那些显示较大 SVM 权重的特征。

在 SVM 中显示特征重要性的 Python 代码是什么？

What is the Python code to show the feature importance in SVM?

python

machine-learning

matplotlib

svm