如何在 python 中进行 PCA 和 SVM 分类

Question

我正在做分类，我有一个列表，有两个这样的大小；

Data=[list1,list2]

list1 的大小为 1000*784。这意味着 1000 张图像已从 28*28 大小重塑为 784.

list2 的大小为 1000*1。它显示了每个图像所属的标签。使用以下代码，我应用了 PCA：

from matplotlib.mlab import PCA
results = PCA(Data[0])

输出是这样的：

Out[40]: <matplotlib.mlab.PCA instance at 0x7f301d58c638>

现在，我想使用 SVM 作为分类器。我应该添加标签。所以我有像这样的 SVm 新数据：

newData=[results,Data[1]]

我不知道这里怎么用SVM。

Answer 1

我想你要找的是http://scikit-learn.org/. It's a python library where you'll find PCA, SVM and other cool algorithms for Machine Learning. It has a good tutorial, but I recommend you follow this guy's http://www.astroml.org/sklearn_tutorial/general_concepts.html . For your particular question, the SVM page of scikit-learn should suffice http://scikit-learn.org/stable/modules/svm.html。

Answer 2

from sklearn.decomposition import PCA
from sklearn.svm import SVC
from sklearn import cross_validation

Data=[list1,list2]
X = Data[0]
y = Data[1]
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.4, random_state=0)
pca = PCA(n_components=2)# adjust yourself
pca.fit(X_train)
X_t_train = pca.transform(X_train)
X_t_test = pca.transform(X_test)
clf = SVC()
clf.fit(X_t_train, y_train)
print 'score', clf.score(X_t_test, y_test)
print 'pred label', clf.predict(X_t_test)

这是在另一个数据集上测试过的代码。

import numpy as np
from sklearn import datasets
from sklearn.decomposition import PCA
from sklearn.svm import SVC
from sklearn import cross_validation

iris = datasets.load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.4, random_state=0)
pca = PCA(n_components=2)# adjust yourself
pca.fit(X_train)
X_t_train = pca.transform(X_train)
X_t_test = pca.transform(X_test)
clf = SVC()
clf.fit(X_t_train, y_train)
print 'score', clf.score(X_t_test, y_test)
print 'pred label', clf.predict(X_t_test)

基于这些参考文献：

如何在 python 中进行 PCA 和 SVM 分类

How to do PCA and SVM for classification in python

python

classification

svm

pca