梯度下降成本收敛

gradient descent cost convergence

我有一个小脚本,它使数据集 xa 和 ya 的成本收敛到零,但无论我对 'iterations' 和 'learning_rate' 使用什么值,我都能得到最好的成本使用数据集 xb 和 yb 时 to 为 31.604。

我的问题是:成本 总是 应该趋向于零吗?如果是,那么我在数据集 xb 和 yb 上做错了什么?

import numpy as np


def gradient_descent(x, y):
    m_curr = b_curr = 0
    iterations = 1250
    n = len(x)
    learning_rate = 0.08

    for i in range(iterations):
        y_predicted = (m_curr * x) + b_curr
        cost = (1/n) * sum([val**2 for val in (y - y_predicted)])
        m_der = -(2/n) * sum(x * (y - y_predicted))
        b_der = -(2/n) * sum(y - y_predicted)
        m_curr = m_curr - (learning_rate * m_der)
        b_curr = b_curr - (learning_rate * b_der)
        print('m {}, b {}, cost {}, iteration {}'.format(m_curr, b_curr, cost, i))


xa = np.array([1, 2, 3, 4, 5])
ya = np.array([5, 7, 9, 11, 13])

# xb = np.array([92, 56, 88, 70, 80, 49, 65, 35, 66, 67])
# yb = np.array([98, 68, 81, 80, 83, 52, 66, 30, 68, 73])

gradient_descent(xa, ya)

# gradient_descent(xb, yb)

使用 xa 和 ya(使用迭代值和 learning_rate,如上所示):

m 2.000000000000002, b 2.999999999999995, cost 1.0255191767873153e-29, iteration 1245
m 2.000000000000001, b 2.9999999999999947, cost 1.0255191767873153e-29, iteration 1246
m 2.000000000000002, b 2.999999999999995, cost 1.0255191767873153e-29, iteration 1247
m 2.000000000000001, b 2.9999999999999947, cost 1.0255191767873153e-29, iteration 1248
m 2.000000000000002, b 2.999999999999995, cost 1.0255191767873153e-29, iteration 1249

使用 xb 和 yb(迭代次数 = 1000 且 learning_rate = 0.00001):

m 1.0445229983270568, b 0.01691112775956422, cost 31.811378572605147, iteration 995
m 1.0445229675787642, b 0.01691330681124408, cost 31.81137809768319, iteration 996
m 1.044522936830507, b 0.016915485860422623, cost 31.811377622762304, iteration 997
m 1.044522906082285, b 0.016917664907099856, cost 31.811377147842503, iteration 998
m 1.0445228753340983, b 0.01691984395127578, cost 31.811376672923775, iteration 999

使用 xb 和 yb(迭代次数 = 200000 且 learning_rate = 0.00021):

m 1.017952329085966, b 1.8999054866690825, cost 31.604524796644444, iteration 199995
m 1.0179523238769337, b 1.8999058558198456, cost 31.60452479599536, iteration 199996
m 1.0179523186680224, b 1.89990622496171, cost 31.604524795346318, iteration 199997
m 1.017952313459241, b 1.899906594094676, cost 31.60452479469731, iteration 199998
m 1.017952308250581, b 1.8999069632187437, cost 31.604524794048356, iteration 199999

很高兴它能帮助您理解。合并评论作为这个问题的答案。

梯度下降函数总是趋向于local/global最小值。这是最小化 error/cost 以使用此处提供的输入值 (X) 成功计算输出 (Y)。

You are solving equation y=mx+b.

在你的 xa, ya 数据中,它能够精确计算,误差为 ~ 0(或)平衡等式两边。 但是在 xb, yb 的情况下,它只能解决 ~31 的错误。

The cost is nothing but the mean error the gradient descent finds while balancing the equation. 
Manually try calculating both sides of the equation, it will become clear.

此外,您可以使用 x 值预测 y,xa、ya 数据的平均误差为 0...而 xb、yb 数据的平均误差为 31。