地球上 smallest/simplest 数据的梯度下降问题

Gradient Descent Problem with smallest/simplest data on planet Earth

我想在这个简单的数据上实现梯度下降算法,但我遇到了问题。如果有人指出我正确的方向,那就太好了。对于 x=6,答案应该是 7,但我没有到达那里。

X = [1, 2, 3, 4]
Y = [2, 3, 4, 5]
m_gradient = 0
b_gradient = 0
m, b = 0, 0
learning_rate = 0.1

N = len(Y)
for p in range(100):
    for idx in range(len(Y)):
        x = X[idx]
        y = Y[idx]
        hyp = (m * x) + b
        m_gradient += -(2/N) * x * (y - hyp)
        b_gradient += -(2/N) * (y - hyp)
    m = m - (m_gradient * learning_rate)
    b = b - (b_gradient * learning_rate)
print(b+m*6)

除了第一次迭代之外,您计算的所有梯度都不正确。您需要在外部 for 循环中将两个梯度都设置为 0。

X = [1, 2, 3, 4]
Y = [2, 3, 4, 5]
m_gradient = 0
b_gradient = 0
m, b = 0, 0
learning_rate = 0.1

N = len(Y)
for p in range(100):
    for idx in range(len(Y)):
        x = X[idx]
        y = Y[idx]
        hyp = (m * x) + b
        m_gradient += -(2/N) * x * (y - hyp)
        b_gradient += -(2/N) * (y - hyp)
    m = m - (m_gradient * learning_rate)
    b = b - (b_gradient * learning_rate)
    m_gradient, b_gradient = 0, 0

print(b+m*6)

例如考虑 b_gradient。在第一次迭代之前 b_gradient = 0 计算为 0 + -0.5*(y0 - (m*x0 +b)) + -0.5(y1 - (m*x1 +b)) + -0.5(y2 - (m*x2 + b)) + -0.5(y3 - (m*x3 + b)),其中 x0 和 y0 分别为 X[0]Y[0]

第一次迭代后b_gradient的值为-7,这是正确的。

问题从第二次迭代开始。对于 0 <= n <= 3,您没有将 b_gradient 计算为 (-0.5(yn - (m*xn + b)) 的总和,而是将其计算为 b_gradient 的前一个值加上对于 0 的 (-0.5(yn - (m*xn + b)) 的总和<= n <= 3.

第二次迭代后b_gradient的值为-2.6,这是不正确的。正确值为4.4,注意4.4 - 7 = -2.6.

您似乎需要使用梯度下降的线性回归系数。更多的数据点,稍微小一点的学习率,通过观察损失来训练更多的 epochs 将有助于减少错误。

随着输入大小变大,下面的代码会给出略微不同的结果。上面提到的方法,比如训练更多的epoch,对于更大范围的数字,都会给出正确的结果。

矢量化版本

import numpy as np

X = np.array([1, 2, 3, 4, 5, 6, 7])
Y = np.array([2, 3, 4, 5, 6, 7, 8])
w_gradient = 0
b_gradient = 0
w, b = 0.5, 0.5

learning_rate = .01
loss = 0
EPOCHS = 2000
N = len(Y)


for i in range(EPOCHS):

    # Predict
    Y_pred = (w * X) + b

    # Loss
    loss = np.square(Y_pred - Y).sum() / (2.0 * N)
    if i % 100 == 0:
        print(loss)

    # Backprop
    grad_y_pred = (2 / N) * (Y_pred - Y)
    w_gradient = (grad_y_pred * X).sum()
    b_gradient = (grad_y_pred).sum()

    # Optimize
    w -= (w_gradient * learning_rate)
    b -= (b_gradient * learning_rate)

print("\n\n")
print("LEARNED:")
print(w, b)
print("\n")
print("TEST:")
print(np.round(b + w * (-2)))
print(np.round(b + w * 0))
print(np.round(b + w * 1))
print(np.round(b + w * 6))
print(np.round(b + w * 3000))

# Expected: 30001, but gives 30002.
# Training for 3000 epochs will give expected result.
# For simple demo with less training data and small input range 2000 in enough
print(np.round(b + w * 30000))

输出

LEARNED:
1.0000349103409163 0.9998271260509328

TEST:
-1.0
1.0
2.0
7.0
3001.0
30002.0

循环版本

import numpy as np

X = np.array([1, 2, 3, 4, 5, 6, 7])
Y = np.array([2, 3, 4, 5, 6, 7, 8])
w_gradient = 0
b_gradient = 0
w, b = 0.5, 0.5

learning_rate = .01
loss = 0
EPOCHS = 2000
N = len(Y)


for i in range(EPOCHS):

    w_gradient = 0
    b_gradient = 0
    loss = 0

    for j in range(N):

        # Predict
        Y_pred = (w * X[j]) + b

        # Loss
        loss += np.square(Y_pred - Y[j]) / (2.0 * N)

        # Backprop
        grad_y_pred = (2 / N) * (Y_pred - Y[j])
        w_gradient += (grad_y_pred * X[j])
        b_gradient += (grad_y_pred)

    # Optimize
    w -= (w_gradient * learning_rate)
    b -= (b_gradient * learning_rate)

    # Print loss
    if i % 100 == 0:
        print(loss)


print("\n\n")
print("LEARNED:")
print(w, b)
print("\n")
print("TEST:")
print(np.round(b + w * (-2)))
print(np.round(b + w * 0))
print(np.round(b + w * 1))
print(np.round(b + w * 6))
print(np.round(b + w * 3000))

# Expected: 30001, but gives 30002.
# Training for 3000 epochs will give expected result.
# For simple demo with less training data and small input range 2000 in enough
print(np.round(b + w * 30000))

输出

LEARNED:
1.0000349103409163 0.9998271260509328

TEST:
-1.0
1.0
2.0
7.0
3001.0
30002.0