感知器训练规则，为什么要乘以x

Perceptron training rule, why multiply by x

我在读 tom Mitchell 的机器学习书，他提到感知器训练规则的公式是

哪里

这意味着如果 is very large then so is , but I don't understand the purpose of a large update when 很大

相反，我觉得如果有大的 then the update should be small since a small fluctuation in will result in a big change in the final output (due to )

调整是矢量加减法，可以认为是旋转一个超平面，使得class0落在一部分，class1落在一部分另一部分。

考虑一个 1xd 权重向量 indicating the weights of the perceptron model. Also, consider a 1xd datapoint 。那么感知器模型的预测值，考虑线性阈值而不失一般性，将是

-- 等式。 1

这里是'.'是点积，或者

上面等式的超平面是

（为简单起见忽略权重更新的迭代索引）

让我们考虑我们有两个 classes 0 和 1，同样不失一般性，标记为 0 的数据点落在方程式的一侧。超平面的 1 <= 0，标记为 1 的数据点落在 Eq.1 > 0.

的另一侧

与该超平面法向的向量是。标签为 0 的数据点之间的角度应大于 90 度，标签为 1 的数据点之间的角度应小于 90 度。

有三种可能（忽略训练率）

：暗示这个例子class被当前的权重集正确化了。因此，我们不需要对特定数据点进行任何更改。
implying that the target was 1, but the present set of weights classified it as 0. The Eq1. which was supposed to be . Eq1. in this case is , which indicates that the angle between and is greater that 90 degrees, which should have been lesser. The update rule is . If you imagine a vector addition in 2d, this will rotate the hyperplane so that the angle between and 比以前更近，小于 90 度。
implying that the target was 0, but the present set of weights classified it as 1. The eq1. which was supposed to be . Eq1. in this case is indicates that the angle between and is lesser that 90 degrees, which should have been greater. The update rule is . Similarly this will rotate the hyperplane so that the angle between and 大于 90 度。

这是一次又一次的迭代，并且旋转和调整超平面，使超平面的法线与标记为 class 的数据点的角度小于 90 度 1并且大于 90 度，class 的数据点标记为 0。

如果的量级很大，会有很大的变化，因此在过程中会出现问题，根据初始权重的量级，可能需要更多的迭代才能收敛。因此，规范化或标准化数据点是个好主意。从这个角度来看，很容易直观地看到更新规则到底在做什么（将偏差视为超平面 Eq.1 的一部分）。现在将其扩展到更复杂的网络和/或具有阈值。