梯度下降成本收敛
gradient descent cost convergence
我有一个小脚本,它使数据集 xa 和 ya 的成本收敛到零,但无论我对 'iterations' 和 'learning_rate' 使用什么值,我都能得到最好的成本使用数据集 xb 和 yb 时 to 为 31.604。
我的问题是:成本 总是 应该趋向于零吗?如果是,那么我在数据集 xb 和 yb 上做错了什么?
import numpy as np
def gradient_descent(x, y):
m_curr = b_curr = 0
iterations = 1250
n = len(x)
learning_rate = 0.08
for i in range(iterations):
y_predicted = (m_curr * x) + b_curr
cost = (1/n) * sum([val**2 for val in (y - y_predicted)])
m_der = -(2/n) * sum(x * (y - y_predicted))
b_der = -(2/n) * sum(y - y_predicted)
m_curr = m_curr - (learning_rate * m_der)
b_curr = b_curr - (learning_rate * b_der)
print('m {}, b {}, cost {}, iteration {}'.format(m_curr, b_curr, cost, i))
xa = np.array([1, 2, 3, 4, 5])
ya = np.array([5, 7, 9, 11, 13])
# xb = np.array([92, 56, 88, 70, 80, 49, 65, 35, 66, 67])
# yb = np.array([98, 68, 81, 80, 83, 52, 66, 30, 68, 73])
gradient_descent(xa, ya)
# gradient_descent(xb, yb)
使用 xa 和 ya(使用迭代值和 learning_rate,如上所示):
m 2.000000000000002, b 2.999999999999995, cost 1.0255191767873153e-29, iteration 1245
m 2.000000000000001, b 2.9999999999999947, cost 1.0255191767873153e-29, iteration 1246
m 2.000000000000002, b 2.999999999999995, cost 1.0255191767873153e-29, iteration 1247
m 2.000000000000001, b 2.9999999999999947, cost 1.0255191767873153e-29, iteration 1248
m 2.000000000000002, b 2.999999999999995, cost 1.0255191767873153e-29, iteration 1249
使用 xb 和 yb(迭代次数 = 1000 且 learning_rate = 0.00001):
m 1.0445229983270568, b 0.01691112775956422, cost 31.811378572605147, iteration 995
m 1.0445229675787642, b 0.01691330681124408, cost 31.81137809768319, iteration 996
m 1.044522936830507, b 0.016915485860422623, cost 31.811377622762304, iteration 997
m 1.044522906082285, b 0.016917664907099856, cost 31.811377147842503, iteration 998
m 1.0445228753340983, b 0.01691984395127578, cost 31.811376672923775, iteration 999
使用 xb 和 yb(迭代次数 = 200000 且 learning_rate = 0.00021):
m 1.017952329085966, b 1.8999054866690825, cost 31.604524796644444, iteration 199995
m 1.0179523238769337, b 1.8999058558198456, cost 31.60452479599536, iteration 199996
m 1.0179523186680224, b 1.89990622496171, cost 31.604524795346318, iteration 199997
m 1.017952313459241, b 1.899906594094676, cost 31.60452479469731, iteration 199998
m 1.017952308250581, b 1.8999069632187437, cost 31.604524794048356, iteration 199999
很高兴它能帮助您理解。合并评论作为这个问题的答案。
梯度下降函数总是趋向于local/global最小值。这是最小化 error/cost 以使用此处提供的输入值 (X) 成功计算输出 (Y)。
You are solving equation y=mx+b.
在你的 xa, ya 数据中,它能够精确计算,误差为 ~ 0(或)平衡等式两边。
但是在 xb, yb 的情况下,它只能解决 ~31 的错误。
The cost is nothing but the mean error the gradient descent finds while balancing the equation.
Manually try calculating both sides of the equation, it will become clear.
此外,您可以使用 x 值预测 y,xa、ya 数据的平均误差为 0...而 xb、yb 数据的平均误差为 31。
我有一个小脚本,它使数据集 xa 和 ya 的成本收敛到零,但无论我对 'iterations' 和 'learning_rate' 使用什么值,我都能得到最好的成本使用数据集 xb 和 yb 时 to 为 31.604。
我的问题是:成本 总是 应该趋向于零吗?如果是,那么我在数据集 xb 和 yb 上做错了什么?
import numpy as np
def gradient_descent(x, y):
m_curr = b_curr = 0
iterations = 1250
n = len(x)
learning_rate = 0.08
for i in range(iterations):
y_predicted = (m_curr * x) + b_curr
cost = (1/n) * sum([val**2 for val in (y - y_predicted)])
m_der = -(2/n) * sum(x * (y - y_predicted))
b_der = -(2/n) * sum(y - y_predicted)
m_curr = m_curr - (learning_rate * m_der)
b_curr = b_curr - (learning_rate * b_der)
print('m {}, b {}, cost {}, iteration {}'.format(m_curr, b_curr, cost, i))
xa = np.array([1, 2, 3, 4, 5])
ya = np.array([5, 7, 9, 11, 13])
# xb = np.array([92, 56, 88, 70, 80, 49, 65, 35, 66, 67])
# yb = np.array([98, 68, 81, 80, 83, 52, 66, 30, 68, 73])
gradient_descent(xa, ya)
# gradient_descent(xb, yb)
使用 xa 和 ya(使用迭代值和 learning_rate,如上所示):
m 2.000000000000002, b 2.999999999999995, cost 1.0255191767873153e-29, iteration 1245
m 2.000000000000001, b 2.9999999999999947, cost 1.0255191767873153e-29, iteration 1246
m 2.000000000000002, b 2.999999999999995, cost 1.0255191767873153e-29, iteration 1247
m 2.000000000000001, b 2.9999999999999947, cost 1.0255191767873153e-29, iteration 1248
m 2.000000000000002, b 2.999999999999995, cost 1.0255191767873153e-29, iteration 1249
使用 xb 和 yb(迭代次数 = 1000 且 learning_rate = 0.00001):
m 1.0445229983270568, b 0.01691112775956422, cost 31.811378572605147, iteration 995
m 1.0445229675787642, b 0.01691330681124408, cost 31.81137809768319, iteration 996
m 1.044522936830507, b 0.016915485860422623, cost 31.811377622762304, iteration 997
m 1.044522906082285, b 0.016917664907099856, cost 31.811377147842503, iteration 998
m 1.0445228753340983, b 0.01691984395127578, cost 31.811376672923775, iteration 999
使用 xb 和 yb(迭代次数 = 200000 且 learning_rate = 0.00021):
m 1.017952329085966, b 1.8999054866690825, cost 31.604524796644444, iteration 199995
m 1.0179523238769337, b 1.8999058558198456, cost 31.60452479599536, iteration 199996
m 1.0179523186680224, b 1.89990622496171, cost 31.604524795346318, iteration 199997
m 1.017952313459241, b 1.899906594094676, cost 31.60452479469731, iteration 199998
m 1.017952308250581, b 1.8999069632187437, cost 31.604524794048356, iteration 199999
很高兴它能帮助您理解。合并评论作为这个问题的答案。
梯度下降函数总是趋向于local/global最小值。这是最小化 error/cost 以使用此处提供的输入值 (X) 成功计算输出 (Y)。
You are solving equation y=mx+b.
在你的 xa, ya 数据中,它能够精确计算,误差为 ~ 0(或)平衡等式两边。 但是在 xb, yb 的情况下,它只能解决 ~31 的错误。
The cost is nothing but the mean error the gradient descent finds while balancing the equation.
Manually try calculating both sides of the equation, it will become clear.
此外,您可以使用 x 值预测 y,xa、ya 数据的平均误差为 0...而 xb、yb 数据的平均误差为 31。