在 SVM 中显示特征重要性的 Python 代码是什么?

What is the Python code to show the feature importance in SVM?

如何显示对 SVM 模型有贡献的重要特征以及特征名称?

我的代码如下所示,

首先我导入了模块

from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.grid_search import GridSearchCV
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report

然后我将数据分为特征和变量

y = df_new[['numeric values']]
X = df_new.drop('numeric values', axis=1).values

然后我设置管道

steps = [('scalar', StandardScaler()),
         ('SVM', SVC(kernel='linear'))]

pipeline = Pipeline(steps)

然后我指定了我的超参数space

parameters = {'SVM__C':[1, 10, 100],
              'SVM__gamma':[0.1, 0.01]}

我创建了一个训练集和测试集

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.2, random_state=21)

实例化 GridSearchCV 对象:cv

cv = GridSearchCV(pipeline,param_grid = parameters,cv=5)

适合训练集

cv.fit(X_train,y_train.values.ravel())

预测测试集的标签:y_pred

y_pred = cv.predict(X_test)

feature_importances = cv.best_estimator_.feature_importances_

我收到的错误信息

'Pipeline' object has no attribute 'feature_importances_'

我的理解是,假设您正在构建一个具有 100 个特征的模型,并且您想知道在这种情况下哪个特征更重要,哪个更不重要?

只需尝试单变量特征选择方法,这是非常基本的方法,您可以先试用一下,然后再为您的数据改进方法。示例代码由 scikit-learn 自行提供。您可以根据您的要求对其进行修改。

print(__doc__)

import numpy as np
import matplotlib.pyplot as plt

from sklearn import datasets, svm
from sklearn.feature_selection import SelectPercentile, f_classif

###############################################################################
# import some data to play with

# The iris dataset
iris = datasets.load_iris()

# Some noisy data not correlated
E = np.random.uniform(0, 0.1, size=(len(iris.data), 20))

# Add the noisy data to the informative features
X = np.hstack((iris.data, E))
y = iris.target

###############################################################################
plt.figure(1)
plt.clf()

X_indices = np.arange(X.shape[-1])

###############################################################################
# Univariate feature selection with F-test for feature scoring
# We use the default selection function: the 10% most significant features
selector = SelectPercentile(f_classif, percentile=10)
selector.fit(X, y)
scores = -np.log10(selector.pvalues_)
scores /= scores.max()
plt.bar(X_indices - .45, scores, width=.2,
        label=r'Univariate score ($-Log(p_{value})$)', color='g')

###############################################################################
# Compare to the weights of an SVM
clf = svm.SVC(kernel='linear')
clf.fit(X, y)

svm_weights = (clf.coef_ ** 2).sum(axis=0)
svm_weights /= svm_weights.max()

plt.bar(X_indices - .25, svm_weights, width=.2, label='SVM weight', color='r')

clf_selected = svm.SVC(kernel='linear')
clf_selected.fit(selector.transform(X), y)

svm_weights_selected = (clf_selected.coef_ ** 2).sum(axis=0)
svm_weights_selected /= svm_weights_selected.max()

plt.bar(X_indices[selector.get_support()] - .05, svm_weights_selected,
        width=.2, label='SVM weights after selection', color='b')


plt.title("Comparing feature selection")
plt.xlabel('Feature number')
plt.yticks(())
plt.axis('tight')
plt.legend(loc='upper right')
plt.show()

代码参考。 http://scikit-learn.org/0.15/auto_examples/plot_feature_selection.html

备注; 对于每个特征,此方法将绘制单变量特征选择的 p 值和 SVM 的相应权重。此方法选择那些显示较大 SVM 权重的特征。