为什么具有铰链损失的 SGDClassifier 比 scikit-learn 中的 SVC 实现更快

Question

正如我们所知，对于支持向量机，我们可以使用 SVC 以及具有铰链损失实现的 SGDClassifier。具有铰链损失的 SGDClassifier 实现是否比 SVC 更快。为什么？

scikit-learn 中两个 SVC 实现的链接：
SVC
SGDClassifier

我在 sci-kit 的文档页面上读到，SVC 使用 libsvm 库的一些算法进行优化。虽然 SGDClassifier 使用 SGD（显然）。

Answer 1

与损失为 'hinge' 的 sklearn SGD 分类器相比，sklearn SVM 的计算量更大。因此我们使用速度更快的 SGD 分类器。这仅适用于线性 SVM。如果我们使用 'rbf' 内核，则 SGD 不适合。

Answer 2

我认为这是因为 SGD 中使用的批量大小，如果你使用带有 SGD 分类器的完整批量，它应该与 SVM 花费相同的时间，但改变批量大小可以导致更快的收敛。

Answer 3

也许开始尝试一些实际案例并阅读代码会更好。让我们开始吧...

首先，如果我们阅读 SGDC 的文档，它说线性 SVM 仅使用：

Linear classifiers (SVM, logistic regression, a.o.) with SGD training

如果我们不使用通常的 SVC，而是使用 LinearSVC 会怎么样？

Similar to SVC with parameter kernel=’linear’, but implemented in terms of liblinear rather than libsvm, so it has more flexibility in the choice of penalties and loss functions and should scale better to large numbers of samples.

让我们为这三种算法添加一个示例：

from sklearn.svm import SVC
from sklearn.linear_model import SGDClassifier
from sklearn.svm import LinearSVC
from sklearn import datasets
import numpy as np

iris = datasets.load_iris()
X = np.random.rand(20000,2)

Y = np.random.choice(a=[False, True], size=(20000, 1))

# hinge is used as the default
svc = SVC(kernel='linear')

sgd = SGDClassifier(loss='hinge')

svcl = LinearSVC(loss='hinge')

使用 jupyter 和命令 %%time 我们得到执行时间（你可以在正常 python 中使用类似的方法，但我就是这样做的）：

%%time
svc.fit(X, Y)

挂墙时间：5.61 秒

%%time
sgd.fit(X, Y)

挂墙时间：24 毫秒

%%time
svcl.fit(X, Y)

挂墙时间：26.5ms

正如我们所见，它们之间存在巨大差异，但线性和 SGDC 的时间大致相同。时间总是有点不同，但这总是会发生，因为每个算法的执行都不是来自相同的代码。

如果您对每个实现都感兴趣，我建议您使用新的 github 阅读工具阅读 github 代码，这真的很棒！

代码linearSVC

SGDC

代码

为什么具有铰链损失的 SGDClassifier 比 scikit-learn 中的 SVC 实现更快

Why SGDClassifier with hinge loss is faster than SVC implementation in scikit-learn

python

machine-learning

svm

gradient-descent

scikit-learn