感知器训练规则,为什么要乘以x

Perceptron training rule, why multiply by x

我在读 tom Mitchell 的机器学习书,他提到感知器训练规则的公式是

哪里

这意味着如果 is very large then so is , but I don't understand the purpose of a large update when 很大

相反,我觉得如果有大的 then the update should be small since a small fluctuation in will result in a big change in the final output (due to )

调整是矢量加减法,可以认为是旋转一个超平面,使得class0落在一部分,class1落在一部分另一部分。

考虑一个 1xd 权重向量 indicating the weights of the perceptron model. Also, consider a 1xd datapoint 。那么感知器模型的预测值,考虑线性阈值而不失一般性,将是

-- 等式。 1

这里是'.'是点积,或者

上面等式的超平面是

(为简单起见忽略权重更新的迭代索引)

让我们考虑我们有两个 classes 01,同样不失一般性,标记为 0 的数据点落在方程式的一侧。超平面的 1 <= 0,标记为 1 的数据点落在 Eq.1 > 0.

的另一侧

与该超平面法向的向量是。标签为 0 的数据点之间的角度应大于 90 度,标签为 1 的数据点之间的角度应小于 90 度。

有三种可能(忽略训练率)

  • :暗示这个例子class被当前的权重集正确化了。因此,我们不需要对特定数据点进行任何更改。
  • implying that the target was 1, but the present set of weights classified it as 0. The Eq1. which was supposed to be . Eq1. in this case is , which indicates that the angle between and is greater that 90 degrees, which should have been lesser. The update rule is . If you imagine a vector addition in 2d, this will rotate the hyperplane so that the angle between and 比以前更近,小于 90 度。
  • implying that the target was 0, but the present set of weights classified it as 1. The eq1. which was supposed to be . Eq1. in this case is indicates that the angle between and is lesser that 90 degrees, which should have been greater. The update rule is . Similarly this will rotate the hyperplane so that the angle between and 大于 90 度。

这是一次又一次的迭代,并且旋转和调整超平面,使超平面的法线与标记为 class 的数据点的角度小于 901并且大于 90 度,class 的数据点标记为 0

如果的量级很大,会有很大的变化,因此在过程中会出现问题,根据初始权重的量级,可能需要更多的迭代才能收敛。因此,规范化或标准化数据点是个好主意。从这个角度来看,很容易直观地看到更新规则到底在做什么(将偏差视为超平面 Eq.1 的一部分)。现在将其扩展到更复杂的网络和/或具有阈值。

推荐阅读和参考:Neural Network, A Systematic Introduction by Raul Rojas:第四章