在 SVM 中练习内核技巧

Question

我在读SVM的理论。在内核技巧中，我的理解是，如果我们有一个在原始维度 n 中不可线性分离的数据，我们使用内核将数据映射到更高的 space 以线性分离（我们必须选择正确的内核取决于数据集等）。然而，当我看到 Andrew ng Kernel SVM 的这个视频时，我的理解是我们可以将原始数据映射到更小的 space 这让我感到困惑！？任何解释。

你能用一个具体的例子解释一下 RBF 内核如何工作以将每个原始数据样本 x1(x11,x12,x13,....,x1n) 映射到更高的 space （维度 m）成为 X1(X11,X12,X13,...,X1m) 吗？另外，我的理解是内核计算转换后数据的内积（因此在 RBF 之前还有另一个转换，这意味着 RBF 将数据隐式转换为更高的 space 但是如何？）。

其他：内核是一个函数k(x,x1):(R^n)^2->R =g(x).g(x1)，其中g是一个变换函数，在RBF内核的情况下如何定义g？

假设我们在测试集中，我理解的是x是待分类的样本，x1是支持向量（因为只有支持向量会被用来计算超平面）。在 RBF 的情况下 k(x,x1)=exp(-(x-x1)^2/2sigma)，那么改造在哪里呢？

最后一个问题：承认RBF做的是映射到更高维度的m，有可能显示这个m吗？我想看看理论上的现实。

我想用 RBF 内核实现 SVM。这里的m是什么，如何选择？如何在实践中实现内核技巧？

Answer 1

Could you explain me how does RBF kernel work to map each original data sample x1(x11,x12,x13,....,x1n) to a higher space (with dimensions m) to be X1(X11,X12,X13,...,X1m) with a concrete example. Also, what I understand is the kernel compute the inner product of the transformed data (so there is an other transformation before the RBF, which means that RBF transform implicitly the data to a higher space but How?).

正如您所说 - 内核是投影 space 的内积，而不是投影本身。整个技巧是您永远不要转换数据，因为这样做计算成本太高。

other thing: the kernel is a function k(x,x1):(R^n)^2->R =g(x).g(x1), with g is a transformation function, how to define g in the case of RBF kernel?

对于rbf核，g实际上是从R^n到连续函数(L2)的space的映射，每个点映射到均值x，方差sigma^2的非归一化高斯分布.因此（直到我们将删除的一些归一化常数 A）

g(x) = N(x, sigma^2)[z] / A # notice this is not a number but a function of z!

现在函数 space 中的内积是整个域上积的积分因此

K(x, y) = <g(x), g(y)> 
        = INT_{R^n} N(x, sigma^2)[z] N(y, sigma^2)[z] / A^2 dz 
        = B exp(-||x-y||^2 / (2*sigma^2))

其中 B 是某个常数因子（归一化），仅取决于 sigma^2，因此我们可以将其删除（因为缩放在这里并不重要）以简化计算。

Suppose that we are in the test set, What I understand is x is the sample to be classified and x1 is the support vector (because only the support vectors will be used to calculate the hyperplane). in the case of RBF k(x,x1)=exp(-(x-x1)^2/2sigma), so where is the transformation?

如前所述 - 转换从未明确使用过，您只需证明超平面与转换点的内积可以再次表示为具有支持向量的内积，因此你永远不会改变任何东西，只需使用内核

<w, g(x)> = < SUM_{i=1}^N alpha_i y_i g(sv_i), g(x)> 
          = SUM_{i=1}^N alpha_i y_i <g(sv_i), g(x)>
          = SUM_{i=1}^N alpha_i y_i K(sv_i, x)

其中 sv_i 是第 i 个支持向量，alpha_i 是在优化过程中找到的每个样本权重（拉格朗日乘数），y_i 是第 i 个的标签支持向量。

Last question: Admit that the RBF do the mapping to a higher dimension m, it is possible to show this m? I want to see the theoretical reality.

在这种情况下，m 是无穷大，因为您的新 space 是 R^n -> R 域中连续函数的 space，因此是单个向量（函数) 被定义为连续统（实数集的大小）值 - 每个来自 R^n 的可能输入值一个（这是一个简单的集合论结果，对于任何正 n 的 R^n 都是大小连续统）。因此，就纯数学而言，m = |R|，并使用集合论，这就是所谓的 Beth_1（https://en.wikipedia.org/wiki/Beth_number）。

I want to implement SVM with RBF kernel. What is the m here and how to choose it? How to implement kernel trick in practice?

你不用选m，它是内核自己定义的。在实践中实施内核技巧需要以形式表达所有优化例程，其中训练点仅在内积的上下文中使用，并仅用内核调用替换它们。这太复杂了，无法以 SO 形式描述。

在 SVM 中练习内核技巧

Practicing Kernel trick in SVM

kernel

machine-learning

svm

scikit-learn