交叉验证后如何获得支持向量数
How to get support vector number after cross validation
这是我使用非线性 SVM 进行数字分类的代码。我将交叉验证方案应用于 select 超参数 c
和 gamma
。但是,GridSearch 返回的模型没有 n_support_
属性来获取支持向量的数量。
from sklearn import datasets
from sklearn.cross_validation import train_test_split
from sklearn.grid_search import GridSearchCV
from sklearn.metrics import classification_report
from sklearn.svm import SVC
from sklearn.cross_validation import ShuffleSplit
# Loading the Digits dataset
digits = datasets.load_digits()
# To apply an classifier on this data, we need to flatten the image, to
# turn the data in a (samples, feature) matrix:
n_samples = len(digits.images)
X = digits.images.reshape((n_samples, -1))
y = digits.target
# Split the dataset in two equal parts
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.5, random_state=0)
#Intilize an svm estimator
estimator=SVC(kernel='rbf',C=1,gamma=1)
#Choose cross validation iterator.
cv = ShuffleSplit(X_train.shape[0], n_iter=5, test_size=0.2, random_state=0)
# Set the parameters by cross-validation
tuned_parameters = [{'kernel': ['rbf'], 'gamma': [1e-3, 1e-4,1,2,10],
'C': [1, 10, 50, 100, 1000]},
{'kernel': ['linear'], 'C': [1, 10, 100, 1000]}]
clf=GridSearchCV(estimator=estimator, cv=cv, param_grid=tuned_parameters)
#begin the cross-validation task to get the best model with best parameters.
#After this task, we get a clf as a best model with best parameters C and gamma.
clf.fit(X_train,y_train)
print()
print ("Best parameters: ")
print(clf.get_params)
print("error test set with clf1",clf.score(X_test,y_test))
print("error training set with cf1",clf.score(X_train,y_train))
#It does not work. So, how can I recuperate the number of vector support?
print ("Number of support vectors by class", clf.n_support_);
**##Here is my methods. I train a new SVM object with the best parameters and I remark that it gate the same test and train error as clf**
clf2=SVC(C=10,gamma= 0.001);
clf2.fit(X_train,y_train)
print("error test set with clf2 ",clf2.score(X_test,y_test))
print("error training set with cf1",clf.score(X_train,y_train))
print clf2.n_support_
如果我提出的方法正确,有什么意见吗?
GridSearchCV
适合多种型号。你可以用 clf.best_estimator_
得到最好的一个,所以要在你的训练集中找到支持向量的索引,你可以使用 clf.best_estimator_.n_support_
,当然 len(clf.best_estimator_.n_support_)
会给你支持向量的数量.
您还可以分别用clf.best_params_
和clf.best_score_
得到最佳模型的参数和分数。
这是我使用非线性 SVM 进行数字分类的代码。我将交叉验证方案应用于 select 超参数 c
和 gamma
。但是,GridSearch 返回的模型没有 n_support_
属性来获取支持向量的数量。
from sklearn import datasets
from sklearn.cross_validation import train_test_split
from sklearn.grid_search import GridSearchCV
from sklearn.metrics import classification_report
from sklearn.svm import SVC
from sklearn.cross_validation import ShuffleSplit
# Loading the Digits dataset
digits = datasets.load_digits()
# To apply an classifier on this data, we need to flatten the image, to
# turn the data in a (samples, feature) matrix:
n_samples = len(digits.images)
X = digits.images.reshape((n_samples, -1))
y = digits.target
# Split the dataset in two equal parts
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.5, random_state=0)
#Intilize an svm estimator
estimator=SVC(kernel='rbf',C=1,gamma=1)
#Choose cross validation iterator.
cv = ShuffleSplit(X_train.shape[0], n_iter=5, test_size=0.2, random_state=0)
# Set the parameters by cross-validation
tuned_parameters = [{'kernel': ['rbf'], 'gamma': [1e-3, 1e-4,1,2,10],
'C': [1, 10, 50, 100, 1000]},
{'kernel': ['linear'], 'C': [1, 10, 100, 1000]}]
clf=GridSearchCV(estimator=estimator, cv=cv, param_grid=tuned_parameters)
#begin the cross-validation task to get the best model with best parameters.
#After this task, we get a clf as a best model with best parameters C and gamma.
clf.fit(X_train,y_train)
print()
print ("Best parameters: ")
print(clf.get_params)
print("error test set with clf1",clf.score(X_test,y_test))
print("error training set with cf1",clf.score(X_train,y_train))
#It does not work. So, how can I recuperate the number of vector support?
print ("Number of support vectors by class", clf.n_support_);
**##Here is my methods. I train a new SVM object with the best parameters and I remark that it gate the same test and train error as clf**
clf2=SVC(C=10,gamma= 0.001);
clf2.fit(X_train,y_train)
print("error test set with clf2 ",clf2.score(X_test,y_test))
print("error training set with cf1",clf.score(X_train,y_train))
print clf2.n_support_
如果我提出的方法正确,有什么意见吗?
GridSearchCV
适合多种型号。你可以用 clf.best_estimator_
得到最好的一个,所以要在你的训练集中找到支持向量的索引,你可以使用 clf.best_estimator_.n_support_
,当然 len(clf.best_estimator_.n_support_)
会给你支持向量的数量.
您还可以分别用clf.best_params_
和clf.best_score_
得到最佳模型的参数和分数。