成本函数在批量梯度下降中发散
cost function diverging in batch gradient descent
我正在尝试在python中实现梯度下降法。我希望计算在 abs(J-J_new)
达到某个容差水平(即收敛)时停止,其中 J 是成本函数。计算也将在设定的迭代次数后停止。我尝试了几种不同的实现,在我所有的尝试中,成本函数实际上是不同的(即 |J-J_new| -> inf
)。这对我来说意义不大,我无法确定为什么它会从我的代码中这样做。我正在用 4 个微不足道的数据点测试实现。我现在已经将其注释掉了,但是 x
和 y
最终将从包含 400 多个数据点的文本文件中读取。这是我能想出的最简单的实现:
# import necessary packages
import numpy as np
import matplotlib.pyplot as plt
'''
For right now, I will hard code all parameters. After all code is written and I know that I implemented the
algorithm correctly, I will consense the code into a single function.
'''
# Trivial data set to test
x = np.array([1, 3, 6, 8])
y = np.array([3, 5, 6, 5])
# Define parameter values
alpha = 0.1
tol = 1e-06
m = y.size
imax = 100000
# Define initial values
theta_0 = np.array([0.0]) # theta_0 guess
theta_1 = np.array([0.0]) # theta_1 guess
J = sum([(theta_0 - theta_1 * x[i] - y[i])**2 for i in range(m)])
# Begin gradient descent algorithm
converged = False
inum = 0
while not converged:
grad_0 = (1/m) * sum([(theta_0 + theta_1 * x[i] - y[i]) for i in range(m)])
grad_1 = (1/m) * sum([(theta_0 + theta_1 * x[i] - y[i]) * x[i] for i in range(m)])
temp_0 = theta_0 - alpha * grad_0
temp_1 = theta_1 - alpha * grad_1
theta_0 = temp_0
theta_1 = temp_1
J_new = sum([(theta_0 + theta_1 * x[i] - y[i])**2 for i in range(m)])
if abs(J - J_new) <= tol:
print('Converged at iteration', inum)
converged = True
J = J_new
inum = inum + 1
if inum == imax:
print('Maximum number of iterations reached!')
converged = True
我又试验了一点。出现分歧是因为学习率 alpha
太高了。当我改变检查收敛性的方式时,它也有所帮助。我没有使用 abs(J - J_new)
来检查收敛性,而是使用 abs(theta0_new - theta_0)
和 abs(theta1_new - theta_1)
。如果这两者都在一定的公差范围内,那么它已经收敛了。我还重新缩放(标准化)数据,这似乎也有帮助。这是代码:
# import necessary packages
import numpy as np
import matplotlib.pyplot as plt
# Gradint descent function
def gradient_descent(x,y,alpha,tol,imax):
# size of data set
m = y.size
# Define initial values
theta_0 = np.array([0.0]) # theta_0 initial guess
theta_1 = np.array([0.0]) # theta_1 initial guess
# Begin gradient descent algorithm
convergence = False
inum = 0
# While loop continues until convergence = True
while not convergence:
# Calculate gradients for theta_0 and theta_1
grad_0 = (1/m) * sum([(theta_0 + theta_1 * x[i] - y[i]) for i in range(m)])
grad_1 = (1/m) * sum([(theta_0 + theta_1 * x[i] - y[i]) * x[i] for i in range(m)])
# Update theta_0 and theta_1
temp0 = theta_0 - alpha * grad_0
temp1 = theta_1 - alpha * grad_1
theta0_new = temp0
theta1_new = temp1
# Check convergence, and stop loop if correct conditions are met
if abs(theta0_new - theta_0) <= tol and abs(theta1_new - theta_1) <= tol:
print('We have convergence at iteration', inum, '!')
convergence = True
# Update theta_0 and theta_1 for next iteration
theta_0 = theta0_new
theta_1 = theta1_new
# Increment itertion counter
inum = inum + 1
# Check iteration number, and stop loop if inum == imax
if inum == imax:
print('Maximum number of iterations reached. We have convergence!')
convrgence = True
# Show result
print('Slope=', theta_1)
print('Intercept=', theta_0)
print('Iteration of convergece=', inum)
# Load data from text file
data = np.loadtxt('InputData.txt')
# Define data set
x = data[:,0]
y = data[:,1]
# Rescale the data
x = x/(max(x)-min(x))
y = y/(max(y)-min(y))
# Define input parameters
alpha = 1e-02
tol = 1e-05
imax = 10000
# Function call
gradient_descent(x, y, alpha, tol, imax)
我只用那个文本文件中的那个数据集检查过它。
我正在尝试在python中实现梯度下降法。我希望计算在 abs(J-J_new)
达到某个容差水平(即收敛)时停止,其中 J 是成本函数。计算也将在设定的迭代次数后停止。我尝试了几种不同的实现,在我所有的尝试中,成本函数实际上是不同的(即 |J-J_new| -> inf
)。这对我来说意义不大,我无法确定为什么它会从我的代码中这样做。我正在用 4 个微不足道的数据点测试实现。我现在已经将其注释掉了,但是 x
和 y
最终将从包含 400 多个数据点的文本文件中读取。这是我能想出的最简单的实现:
# import necessary packages
import numpy as np
import matplotlib.pyplot as plt
'''
For right now, I will hard code all parameters. After all code is written and I know that I implemented the
algorithm correctly, I will consense the code into a single function.
'''
# Trivial data set to test
x = np.array([1, 3, 6, 8])
y = np.array([3, 5, 6, 5])
# Define parameter values
alpha = 0.1
tol = 1e-06
m = y.size
imax = 100000
# Define initial values
theta_0 = np.array([0.0]) # theta_0 guess
theta_1 = np.array([0.0]) # theta_1 guess
J = sum([(theta_0 - theta_1 * x[i] - y[i])**2 for i in range(m)])
# Begin gradient descent algorithm
converged = False
inum = 0
while not converged:
grad_0 = (1/m) * sum([(theta_0 + theta_1 * x[i] - y[i]) for i in range(m)])
grad_1 = (1/m) * sum([(theta_0 + theta_1 * x[i] - y[i]) * x[i] for i in range(m)])
temp_0 = theta_0 - alpha * grad_0
temp_1 = theta_1 - alpha * grad_1
theta_0 = temp_0
theta_1 = temp_1
J_new = sum([(theta_0 + theta_1 * x[i] - y[i])**2 for i in range(m)])
if abs(J - J_new) <= tol:
print('Converged at iteration', inum)
converged = True
J = J_new
inum = inum + 1
if inum == imax:
print('Maximum number of iterations reached!')
converged = True
我又试验了一点。出现分歧是因为学习率 alpha
太高了。当我改变检查收敛性的方式时,它也有所帮助。我没有使用 abs(J - J_new)
来检查收敛性,而是使用 abs(theta0_new - theta_0)
和 abs(theta1_new - theta_1)
。如果这两者都在一定的公差范围内,那么它已经收敛了。我还重新缩放(标准化)数据,这似乎也有帮助。这是代码:
# import necessary packages
import numpy as np
import matplotlib.pyplot as plt
# Gradint descent function
def gradient_descent(x,y,alpha,tol,imax):
# size of data set
m = y.size
# Define initial values
theta_0 = np.array([0.0]) # theta_0 initial guess
theta_1 = np.array([0.0]) # theta_1 initial guess
# Begin gradient descent algorithm
convergence = False
inum = 0
# While loop continues until convergence = True
while not convergence:
# Calculate gradients for theta_0 and theta_1
grad_0 = (1/m) * sum([(theta_0 + theta_1 * x[i] - y[i]) for i in range(m)])
grad_1 = (1/m) * sum([(theta_0 + theta_1 * x[i] - y[i]) * x[i] for i in range(m)])
# Update theta_0 and theta_1
temp0 = theta_0 - alpha * grad_0
temp1 = theta_1 - alpha * grad_1
theta0_new = temp0
theta1_new = temp1
# Check convergence, and stop loop if correct conditions are met
if abs(theta0_new - theta_0) <= tol and abs(theta1_new - theta_1) <= tol:
print('We have convergence at iteration', inum, '!')
convergence = True
# Update theta_0 and theta_1 for next iteration
theta_0 = theta0_new
theta_1 = theta1_new
# Increment itertion counter
inum = inum + 1
# Check iteration number, and stop loop if inum == imax
if inum == imax:
print('Maximum number of iterations reached. We have convergence!')
convrgence = True
# Show result
print('Slope=', theta_1)
print('Intercept=', theta_0)
print('Iteration of convergece=', inum)
# Load data from text file
data = np.loadtxt('InputData.txt')
# Define data set
x = data[:,0]
y = data[:,1]
# Rescale the data
x = x/(max(x)-min(x))
y = y/(max(y)-min(y))
# Define input parameters
alpha = 1e-02
tol = 1e-05
imax = 10000
# Function call
gradient_descent(x, y, alpha, tol, imax)
我只用那个文本文件中的那个数据集检查过它。