Python - 使用梯度下降求解 Ax=b
Python - Solve Ax=b using Gradient Descent
想要求解 Ax=b ,找到 x ,已知矩阵 A ( nxn
和 b nx1
, A 是五对角矩阵,尝试不同的 n
。你可以看到他们是如何在这里设置的:
我想使用梯度下降来求解线性系统。我看到使用这种方法求解 Ax=b
本质上是试图最小化二次函数
f(x) = 0.5*x^t*A*x - b^t*x.
我看到
的维基百科示例
f(x)=x^{4}-3x^{3}+2
会是这样的:
next_x = 6 # We start the search at x=6
gamma = 0.01 # Step size multiplier
precision = 0.00001 # Desired precision of result
max_iters = 10000 # Maximum number of iterations
# Derivative function
def df(x):
return 4 * x ** 3 - 9 * x ** 2
for _i in range(max_iters):
current_x = next_x
next_x = current_x - gamma * df(current_x)
step = next_x - current_x
if abs(step) <= precision:
break
print("Minimum at ", next_x)
# The output for the above will be something like
# "Minimum at 2.2499646074278457"
所以是否可以将 return
替换为 0.5*(A+A.T.conj())*x - b
(这是 f(x) = 0.5*x^t*A*x - b^t*x
的导数(它本身就是我们为 Ax=b 得到的函数)。我试过这个但没有得到 x
的正确结果。你可以在这里看到我的完整代码:
import time
import numpy as np
from math import sqrt
from scipy.linalg import solve_triangular
import math
start_time = time.time()
n=100
################## AAAAA matrix #############################################
A = np.zeros([n, n], dtype=float) # initialize to f zeros
# ------------------first row
A[0][0] = 6
A[0][1] = -4
A[0][2] = 1
# ------------------second row
A[1][0] = -4
A[1][1] = 6
A[1][2] = -4
A[1][3] = 1
# --------------two last rows-----
# n-2 row
A[- 2][- 1] = -4
A[- 2][- 2] = 6
A[- 2][- 3] = -4
A[- 2][- 4] = 1
# n-1 row
A[- 1][- 1] = 6
A[- 1][- 2] = -4
A[- 1][- 3] = 1
# --------------------------- from second to n-2 row --------------------------#
j = 0
for i in range(2, n - 2):
if j == (n - 4):
break
A[i][j] = 1
j = j + 1
j = 1
for i in range(2, n - 2):
if j == (n - 3):
break
A[i][j] = -4
j = j + 1
j = 2
for i in range(2, n - 2):
if j == (n - 2):
break
A[i][j] = 6
j = j + 1
j = 3
for i in range(2, n - 2):
if j == (n - 1):
break
A[i][j] = -4
j = j + 1
j = 4
for i in range(2, n - 2):
if j == (n):
break
A[i][j] = 1
j = j + 1
# -----------------------------end coding of 2nd to n-2 r-------------#
print("\nMatrix A is : \n", A)
####### b matrix ######################################
b = np.zeros(n,float).reshape((n,1))
b[0] = 3
b[1] = -1
#b[len(b) - 1] = 3
#b[len(b) - 2] = -1
b[[0,-1]]=3; b[[1,-2]]=-1
############ init x #####################
x = np.zeros(n,float).reshape((n,1))
#x = [0] * n
#x = np.zeros([n, 1], dtype=float)
print("\n x is ",x)
print("\nMatrix b is \n", b)
#####################################
# Derivative function
def df(x):
a = 0.5 * (A + np.transpose(A))
res = np.dot(a, x) - b
return res
def steep(A,b,x):
next_x = 6 # We start the search at x=6
gamma = 0.01 # Step size multiplier
precision = 0.00001 # Desired precision of result
max_iters = 10000 # Maximum number of iterations
for _i in range(max_iters):
current_x = next_x
next_x = current_x - gamma * df(current_x)
step = next_x - current_x
ass=abs(step)
if ass.any() <= precision:
break
print("Minimum at ", next_x)
return next_x
myx=steep(A,b,x)
print("\n myx is ",myx)
print("--- %s seconds ---" % (time.time() - start_time))
首先,一种简化系数矩阵 A
构造的方法可能是使用 np.diag
并将每个单独的 k
对角矩阵相加而不是使用多个 for
循环,所以以 n=10
为例
import numpy as np
# gradient descent parameters
gamma = 0.01 # step size multiplier
tol = 1e-5 # convergence tolerance for stopping criterion
max_iters = 1e6 # maximum number of iterations
# dimension of the problem
n = 10
A = np.diag(np.ones(n-2), k=-2) + np.diag(-4*np.ones(n-1), k=-1) + \
np.diag(6*np.ones(n), k=0) + \
np.diag(-4*np.ones(n-1), k=1) + np.diag(np.ones(n-2), k=2)
b = np.zeros(n)
b[0] = 3 ; b[1] = -1 ; b[n-1] = 3 ; b[n-2] = -1
现在,梯度下降(最速下降)表明 向量 x_sol
是系统 Ax=b
的解,其中 A
是正定和对称,是二次型
的最小值
def f(A,b,x):
return 0.5*np.dot(np.dot(x,A),x)-np.dot(x,b)
因此请确保执行正确的矩阵乘法(使用 np.dot
函数或 @
运算符)而不是 python 逐元素乘法(使用 *
运算符)此函数 f
的梯度评估 df
使得
def df(A,b,x):
return np.dot(A,x)-b
但是,请注意,因为 A
是 对称的 ,所以 df
可以简化并且 returns 确切的残差 np.dot(A,x)-b
而不是 0.5*np.dot(A.T,x) + 0.5*np.dot(A,x)-b
。您还需要指定一个适当的停止标准,例如根据您的容差因子 tol
测试步距的欧几里德范数。让我们运行这个
# int: you chose 6 as initial guess
x0 = 6
def gradient_descent(A,b,x0):
# initial guess vector x
next_x = x0*np.ones(n)
# print initial f value
print('i = 0 ; f(x)= '+str(f(A,b,next_x)))
i=1
# convergence flag
cvg = False
print('Starting descent')
while i <= max_iters:
curr_x = next_x
next_x = curr_x - gamma * df(A,b,curr_x)
step = next_x - curr_x
# convergence test
if np.linalg.norm(step,2)/(np.linalg.norm(next_x,2)+np.finfo(float).eps) <= tol:
cvg = True
break
# print optionnaly f values while searching for minimum
print('i = '+str(i)+' ; f(x)= '+str(f(A,b,next_x)))
i += 1
if cvg :
print('Minimum found in ' + str(i) + ' iterations.')
print('x_sol =',next_x)
else :
print('No convergence for specified parameters.')
return next_x
这给出了
x_sol = gradient_descent(A,b,x0)
>>> i = 0 ; f(x)= 48.0
>>> Starting descent
>>> i = 1 ; f(x)= 43.20999999999996
>>> i = 2 ; f(x)= 39.17663099999998
>>> i = 3 ; f(x)= 35.75573883350001
>>> i = 4 ; f(x)= 32.83351344396835
>>> i = 5 ; f(x)= 30.319690909679018
>>> i = 6 ; f(x)= 28.1423440799636
>>> ...
>>> i = 19144 ; f(x)= -1.99977775801747
>>> i = 19145 ; f(x)= -1.9997778660326588
>>> i = 19146 ; f(x)= -1.99977797399535
>>> i = 19147 ; f(x)= -1.99977808190557
>>> i = 19148 ; f(x)= -1.999778189763341
>>> Minimum found in 19149 iterations.
>>> x_sol = [1.00991626 1.02509014 1.04110403 1.05415009 1.0614195 1.0614195
1.05415009 1.04110403 1.02509014 1.00991626]
最后,让我们与直接求解器进行比较np.linalg.solve
:
np.allclose(np.linalg.solve(A,b),x_sol)
>>> True
此外,我会使用 scipy.linalg.solveh_banded if gradient descent is not mandatory or upgrade this for better convergence into conjugate gradient method by making use of A
-orthogonal vectors set as search directions with a good preconditioner (see this excellent pdf imo)。希望这会有所帮助。
想要求解 Ax=b ,找到 x ,已知矩阵 A ( nxn
和 b nx1
, A 是五对角矩阵,尝试不同的 n
。你可以看到他们是如何在这里设置的:
我想使用梯度下降来求解线性系统。我看到使用这种方法求解 Ax=b
本质上是试图最小化二次函数
f(x) = 0.5*x^t*A*x - b^t*x.
我看到
的维基百科示例f(x)=x^{4}-3x^{3}+2
会是这样的:
next_x = 6 # We start the search at x=6
gamma = 0.01 # Step size multiplier
precision = 0.00001 # Desired precision of result
max_iters = 10000 # Maximum number of iterations
# Derivative function
def df(x):
return 4 * x ** 3 - 9 * x ** 2
for _i in range(max_iters):
current_x = next_x
next_x = current_x - gamma * df(current_x)
step = next_x - current_x
if abs(step) <= precision:
break
print("Minimum at ", next_x)
# The output for the above will be something like
# "Minimum at 2.2499646074278457"
所以是否可以将 return
替换为 0.5*(A+A.T.conj())*x - b
(这是 f(x) = 0.5*x^t*A*x - b^t*x
的导数(它本身就是我们为 Ax=b 得到的函数)。我试过这个但没有得到 x
的正确结果。你可以在这里看到我的完整代码:
import time
import numpy as np
from math import sqrt
from scipy.linalg import solve_triangular
import math
start_time = time.time()
n=100
################## AAAAA matrix #############################################
A = np.zeros([n, n], dtype=float) # initialize to f zeros
# ------------------first row
A[0][0] = 6
A[0][1] = -4
A[0][2] = 1
# ------------------second row
A[1][0] = -4
A[1][1] = 6
A[1][2] = -4
A[1][3] = 1
# --------------two last rows-----
# n-2 row
A[- 2][- 1] = -4
A[- 2][- 2] = 6
A[- 2][- 3] = -4
A[- 2][- 4] = 1
# n-1 row
A[- 1][- 1] = 6
A[- 1][- 2] = -4
A[- 1][- 3] = 1
# --------------------------- from second to n-2 row --------------------------#
j = 0
for i in range(2, n - 2):
if j == (n - 4):
break
A[i][j] = 1
j = j + 1
j = 1
for i in range(2, n - 2):
if j == (n - 3):
break
A[i][j] = -4
j = j + 1
j = 2
for i in range(2, n - 2):
if j == (n - 2):
break
A[i][j] = 6
j = j + 1
j = 3
for i in range(2, n - 2):
if j == (n - 1):
break
A[i][j] = -4
j = j + 1
j = 4
for i in range(2, n - 2):
if j == (n):
break
A[i][j] = 1
j = j + 1
# -----------------------------end coding of 2nd to n-2 r-------------#
print("\nMatrix A is : \n", A)
####### b matrix ######################################
b = np.zeros(n,float).reshape((n,1))
b[0] = 3
b[1] = -1
#b[len(b) - 1] = 3
#b[len(b) - 2] = -1
b[[0,-1]]=3; b[[1,-2]]=-1
############ init x #####################
x = np.zeros(n,float).reshape((n,1))
#x = [0] * n
#x = np.zeros([n, 1], dtype=float)
print("\n x is ",x)
print("\nMatrix b is \n", b)
#####################################
# Derivative function
def df(x):
a = 0.5 * (A + np.transpose(A))
res = np.dot(a, x) - b
return res
def steep(A,b,x):
next_x = 6 # We start the search at x=6
gamma = 0.01 # Step size multiplier
precision = 0.00001 # Desired precision of result
max_iters = 10000 # Maximum number of iterations
for _i in range(max_iters):
current_x = next_x
next_x = current_x - gamma * df(current_x)
step = next_x - current_x
ass=abs(step)
if ass.any() <= precision:
break
print("Minimum at ", next_x)
return next_x
myx=steep(A,b,x)
print("\n myx is ",myx)
print("--- %s seconds ---" % (time.time() - start_time))
首先,一种简化系数矩阵 A
构造的方法可能是使用 np.diag
并将每个单独的 k
对角矩阵相加而不是使用多个 for
循环,所以以 n=10
为例
import numpy as np
# gradient descent parameters
gamma = 0.01 # step size multiplier
tol = 1e-5 # convergence tolerance for stopping criterion
max_iters = 1e6 # maximum number of iterations
# dimension of the problem
n = 10
A = np.diag(np.ones(n-2), k=-2) + np.diag(-4*np.ones(n-1), k=-1) + \
np.diag(6*np.ones(n), k=0) + \
np.diag(-4*np.ones(n-1), k=1) + np.diag(np.ones(n-2), k=2)
b = np.zeros(n)
b[0] = 3 ; b[1] = -1 ; b[n-1] = 3 ; b[n-2] = -1
现在,梯度下降(最速下降)表明 向量 x_sol
是系统 Ax=b
的解,其中 A
是正定和对称,是二次型
def f(A,b,x):
return 0.5*np.dot(np.dot(x,A),x)-np.dot(x,b)
因此请确保执行正确的矩阵乘法(使用 np.dot
函数或 @
运算符)而不是 python 逐元素乘法(使用 *
运算符)此函数 f
的梯度评估 df
使得
def df(A,b,x):
return np.dot(A,x)-b
但是,请注意,因为 A
是 对称的 ,所以 df
可以简化并且 returns 确切的残差 np.dot(A,x)-b
而不是 0.5*np.dot(A.T,x) + 0.5*np.dot(A,x)-b
。您还需要指定一个适当的停止标准,例如根据您的容差因子 tol
测试步距的欧几里德范数。让我们运行这个
# int: you chose 6 as initial guess
x0 = 6
def gradient_descent(A,b,x0):
# initial guess vector x
next_x = x0*np.ones(n)
# print initial f value
print('i = 0 ; f(x)= '+str(f(A,b,next_x)))
i=1
# convergence flag
cvg = False
print('Starting descent')
while i <= max_iters:
curr_x = next_x
next_x = curr_x - gamma * df(A,b,curr_x)
step = next_x - curr_x
# convergence test
if np.linalg.norm(step,2)/(np.linalg.norm(next_x,2)+np.finfo(float).eps) <= tol:
cvg = True
break
# print optionnaly f values while searching for minimum
print('i = '+str(i)+' ; f(x)= '+str(f(A,b,next_x)))
i += 1
if cvg :
print('Minimum found in ' + str(i) + ' iterations.')
print('x_sol =',next_x)
else :
print('No convergence for specified parameters.')
return next_x
这给出了
x_sol = gradient_descent(A,b,x0)
>>> i = 0 ; f(x)= 48.0
>>> Starting descent
>>> i = 1 ; f(x)= 43.20999999999996
>>> i = 2 ; f(x)= 39.17663099999998
>>> i = 3 ; f(x)= 35.75573883350001
>>> i = 4 ; f(x)= 32.83351344396835
>>> i = 5 ; f(x)= 30.319690909679018
>>> i = 6 ; f(x)= 28.1423440799636
>>> ...
>>> i = 19144 ; f(x)= -1.99977775801747
>>> i = 19145 ; f(x)= -1.9997778660326588
>>> i = 19146 ; f(x)= -1.99977797399535
>>> i = 19147 ; f(x)= -1.99977808190557
>>> i = 19148 ; f(x)= -1.999778189763341
>>> Minimum found in 19149 iterations.
>>> x_sol = [1.00991626 1.02509014 1.04110403 1.05415009 1.0614195 1.0614195
1.05415009 1.04110403 1.02509014 1.00991626]
最后,让我们与直接求解器进行比较np.linalg.solve
:
np.allclose(np.linalg.solve(A,b),x_sol)
>>> True
此外,我会使用 scipy.linalg.solveh_banded if gradient descent is not mandatory or upgrade this for better convergence into conjugate gradient method by making use of A
-orthogonal vectors set as search directions with a good preconditioner (see this excellent pdf imo)。希望这会有所帮助。