在 python scikit-learn 中，RBF 内核的性能比 SVM 中的线性性能差得多

Question

我正在使用 SVM 执行一些机器学习任务。我怀疑数据是非线性的，所以我还包括了 RBF 内核。我发现带有 RBF 内核的 SVM 比线性 SVM 差很多。我想知道我的分类器参数规范是否有问题。

我的代码如下：

from sklearn.svm import LinearSVC
from sklearn.svm import SVC

svm1 = LinearSVC() # performs the best, similar to logistic regression results which is expected
svm2 = LinearSVC(class_weight="auto") # performs somewhat worse than svm1
svm3 = SVC(kernel='rbf', random_state=0, C=1.0, cache_size=4000, class_weight='balanced') # performs way worse than svm1; takes the longest processing time
svm4 = SVC(kernel='rbf', random_state=0, C=1.0, cache_size=4000) # this is the WORST of all, the classifier simply picks the majority class

Answer 1

使用 RBF 尝试调整 C 和 gamma 参数。 Scikit-learn 的网格搜索将为您提供帮助。

这是一个让您入门的示例：

svc = SVC(...)
params = {"C":[0.1, 1, 10], "gamma": [0.1, 0.01, 0.001]}
grid_search = GridSearchCV(svc, params)
grid_search.fit(X,y)

Answer 2

以下论文是 SVM 用户的良好指南。

支持向量分类实用指南 http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf

简而言之，要让SVM正确执行，三点必不可少。

（1）特征准备（特征缩放，特征分类）
(2) 参数调整（粗略和 fine-grained 交叉验证）
(3) 内核 selection（#features 与 #instances）

(3) 的基本思想是 select 如果#features >> #instances 是线性内核。使用小#instances，具有 non-linear 内核的 SVM 很容易过度拟合。

在 python scikit-learn 中，RBF 内核的性能比 SVM 中的线性性能差得多

Much worse performance with RBF kernel than linear in SVM in python scikit-learn

machine-learning

svm

nonlinear-functions

python-2.7

scikit-learn