这种梯度下降的实现有什么问题？

Question

我尝试用梯度下降实现线性回归，但我的误差趋于无穷大。我已经阅读了我的代码，但仍然无法弄清楚我哪里出错了。我希望有人可以帮助我调试为什么这种线性回归的实现不起作用。

当N=100时没有问题，但当N=1000时则观察到发散到无穷大。

import numpy as np

class Regression:
    def __init__(self, xs, ys, w,alpha):
        self.w = w
        self.xs = xs
        self.ys = ys
        self.a = alpha
        self.N = float(len(xs))

    def error(self, ys, yhat):
        return (1./self.N)*np.sum((ys-yhat)**2)

    def propagate(self):
        yhat = xs*w[0]+w[1]
        loss = yhat - self.ys

        r1 = (2./self.N)*np.sum(loss*self.xs)
        r2 = (2./self.N)*np.sum(loss)

        self.w[0] -= self.a*r1
        self.w[1] -= self.a*r2


N = 600
xs = np.arange(0,N)
bias = np.random.sample(size=N)*10
ys = xs * 2. + 2. + bias
ws = np.array([0.,0.])

regressor = Regression(
    xs, ys, ws,
    0.00001)

for i in range(1000):
    regressor.propagate()

输出：

...
2.71623180177e+286
5.27841816362e+286
1.02574818143e+287
1.99332318715e+287
3.87359919362e+287
7.52751526171e+287
1.46281231441e+288
2.84266426942e+288
5.52411274435e+288
1.07349369184e+289
2.0861064206e+289
4.05390365232e+289
7.87789858657e+289
1.5309018532e+290
2.97498179035e+290
5.78124367308e+290
1.12346161297e+291
2.18320843611e+291
4.24260074438e+291
8.2445912074e+291
1.6021607564e+292
3.11345829619e+292
6.05034327761e+292
1.17575539141e+293
2.28483026006e+293
4.4400811218e+293
8.62835227315e+293

Answer 1

随着 N 的增加，起始点 w=[0,0] 处的梯度分量 r1 和 r2 分别以 N 的二次方和线性方式缩放.对于足够大的 N，向量的初始步长 w 变得大于其误差的两倍，这会导致校正过冲并实际上增加误差。正反馈导致 w 围绕正确值振荡，幅度不断增加而不是收敛。

如果将 alpha 缩小十倍，您会发现 N=1000 会收敛。

Answer 2

您已经超出了您方法的收敛半径。我在 propagate:

的底部输入了一个打印语句来跟踪效果

    self.w = np.array(res).astype(np.float)
    print self.error(ys, yhat), '\t', r1, '\t', r2, '\t', self.w

作为K.A。 Buhr 指出，r1 与 N 呈二次方关系。根据输入选择你的学习率；它不是 SGD 算法承诺的常量。这是 N=600 的前 20 次迭代的输出，如您的代码所示：

486826.997899   -482786.592791  -1211.52883528  [ 4.82786593  0.01211529]
946024.542374   673013.376697   1680.38708612   [-1.90226784 -0.00468858]
1838377.19732   -938192.956012  -2350.99664804  [ 7.47966172  0.01882138]
3572474.5816    1307858.19046   3268.82617841   [-5.59892018 -0.01386688]
6942323.62211   -1823178.2573   -4565.30975898  [ 12.63286239   0.03178622]
13490907.7204   2541543.91414   6355.61930844   [-12.78257675  -0.03176997]
26216686.5837   -3542958.75828  -8868.35584965  [ 22.64701083   0.05691359]
50946528.2176   4938949.44036   12354.1444796   [-26.74248357  -0.06662786]
99003709.9274   -6884985.98436  -17230.4097511  [ 42.10737627   0.10567624]
192392610.191   9597796.6223    24011.0009034   [-53.87058995  -0.13443377]
373874053.385   -13379504.31    -33480.2810842  [ 79.92445315   0.20036904]
726544597.0     18651274.1534   46663.6193386   [-106.58828839   -0.26626715]
1411884707.51   -26000217.8559  -65058.4461128  [ 153.41389017    0.38431731]
2743697288.89   36244780.0586   90684.1600127   [-209.03391041   -0.52252429]
5331791469.79   -50525887.4157  -126423.886221  [ 296.22496374    0.74171457]
10361201450.4   70434012.7562   176228.707876   [-408.11516382   -1.02057251]
20134788880.2   -98186304.1721  -245674.553107  [ 573.7478779     1.43617302]
39127675046.8   136873506.894   342466.322375   [-794.98719104   -1.9884902 ]
76036305324.8   -190804176.229  -477412.833248  [ 1113.05457125     2.78563813]
147760369643.0  265984517.38    665513.730619   [-1546.79060255    -3.86949918]

但是，如果将 alpha 设置为 E-6（而不是 E-5），前 10 行是

14495.6359775   -13788.3126768  -211.542964687  [ 0.01378831  0.00021154]
14306.0982004   -13697.7438847  -210.177498646  [ 0.02748606  0.00042172]
14119.0422005   -13607.7699931  -208.821001646  [ 0.04109383  0.00063054]
13934.4354818   -13518.3870942  -207.473414775  [ 0.05461221  0.00083801]
13752.2459738   -13429.5913063  -206.134679506  [ 0.0680418   0.00104415]
13572.4420258   -13341.3787729  -204.804737697  [ 0.08138318  0.00124895]
13394.9924018   -13253.7456628  -203.483531589  [ 0.09463693  0.00145244]
13219.8662747   -13166.6881702  -202.171003801  [ 0.10780362  0.00165461]
13047.0332208   -13080.202514   -200.867097331  [ 0.12088382  0.00185548]
12876.4632151   -12994.2849383  -199.571755548  [ 0.13387811  0.00205505]
12708.1266257   -12908.9317115  -198.284922195  [ 0.14678704  0.00225333]

...它继续收敛。顺便说一句，即使在 N=600 时，1000 次迭代也不足以实现适当的收敛；您可能想要使用 epsilon 数字而不是迭代次数。

这种梯度下降的实现有什么问题？

What is the issue with this implementation of gradient descent?

python

regression

linear-regression