LinearSVC 中 `penalty` 和 `loss` 的含义

Meaning of `penalty` and `loss` in LinearSVC

Anti-closing preamble: 我已经阅读了问题“difference between penalty and loss parameters in Sklearn LinearSVC library”，但我发现那里的答案不够具体。因此，我重新表述问题：

我我熟悉 SVM 理论，我正在 Python 中尝试使用 LinearSVC class。但是，documentation关于penalty和loss参数的含义不是很清楚。我认为 loss 指的是对违反边距的点数的惩罚（通常用希腊字母 xi 或 zeta 在 objective函数），而penalty是确定class边界的向量的范数，通常表示为w.谁能证实或否认这一点？

如果我的猜测是正确的，那么 penalty = 'l1' 将导致向量 w 的 L1-范数最小化，就像在 LASSO 回归中一样。这与 SVM 的最大间隔思想有何关系？谁能指出我关于这个问题的出版物？在 original paper describing LIBLINEAR 中，我找不到任何对 L1 惩罚的引用。

另外，如果我猜对了，为什么LinearSVC在dual=False时不支持penalty='l2'和loss='hinge'的组合（SVC中的标准组合）？尝试时，我得到

ValueError: Unsupported set of arguments

虽然很晚了，但我会尽力给出我的答案。根据 doc，这里是 LinearSVC 的考虑的原始优化问题： $\underset{w,b}{min} \frac{1}{2}w^Tw + C\sum_{i=1}^m max\left\{0, 1-y_i(w^T\phi(x_i) + b)\right \}$ ,phi 是单位矩阵，因为 LinearSVC 只解决线性问题。

实际上，这只是LinearSVC承认的可能问题之一（它是L2-regularized，L1-loss 在 LIBLINEAR 论文的条款中）而不是默认的（即 L2-regularized、L2-loss）。 LIBLINEAR 论文针对第 2 章中所谓的 loss 给出了更一般的表述，然后它还进一步阐述了附录 (A2+A4) 中所谓的 penalty。

基本上，它声明 LIBLINEAR 是为了解决以下无约束优化 pb 具有不同的 loss 函数 xi(w;x,y)（即 hinge 和 squared_hinge）； LIBLINEAR 中模型的默认设置不考虑偏差项，这就是为什么从现在开始您将看不到任何对 b 的引用（关于此的帖子很多）。

$\xi (w;x_i,y_i) = max\left \{0, 1-y_iw^Tx_i \right \}$ 、hinge 或 L1-loss
$\xi(w;x_i,y_i) = (max\left \{ 0, 1-y_iw^Tx_i \right \})^2$ 、squared_hinge 或 L2-loss.

关于penalty，基本上这代表了所用向量w的范数。附录对不同的问题进行了详细说明：

L2-regularized, L1-loss (penalty='l2', loss='hinge'): $\underset{w}{min} \frac{1}{2}w^Tw + C\sum_{i=1}^mmax\left\{0, 1-y_i(w^Tx_i)\right\}$
L2-regularized, L2-loss (penalty='l2', loss='squared_hinge'), 默认在LinearSVC: $\underset{w}{min} \frac{1}{2}w^Tw + C\sum_{i=1}^m(max\left\{0, 1-y_i(w^Tx_i)\right\})^2$
L1-regularized, L2-loss (penalty='l1', loss='squared_hinge'): $\underset{w}{min} \left \| w \right \|_1 + C\sum_{i=1}^m(max\left\{0, 1-y_i(w^Tx_i)\right\})^2$

相反，如文档中所述，LinearSVC 不支持 penalty='l1' 和 loss='hinge' 的组合。据我所知，这篇论文没有具体说明原因，但我找到了一个可能的答案 here（在 Arun Iyer 的答案中）。

最终，penalty='l2'、loss='hinge'、dual=False 的有效组合不受 here (it is just not implemented in LIBLINEAR) or here 中指定的支持；不确定是否是这种情况，但在从附录 B 开始的 LIBLINEAR 论文中指定了已解决的优化 pb（在 L2-regularized、L1 的情况下-loss 好像是对偶).

对于 SVC pbs 的一般理论讨论，我发现 that chapter 非常有用；它显示了 w 范数的最小化如何与最大边距的概念相关。

LinearSVC 中 `penalty` 和 `loss` 的含义

Meaning of `penalty` and `loss` in LinearSVC

python

svm

scikit-learn

liblinear