梯度下降 MATLAB 脚本
gradient descent MATLAB script
所以我写了下面的MATLAB代码作为梯度下降的练习。我显然选择了一个最小值为 (0,0) 的函数,但算法将我抛向 (-3,3)。
我确实发现在线 xGrad
和 yGrad
之间切换:[xGrad,yGrad] = gradient(f);
可以实现正确的收敛,尽管 xGrad
、yGrad
大约是 2*X
、2*Y
,符合预期。我想我在这里颠倒了一些东西,但我一直试图弄清楚它是什么,但我不明白,所以我希望有人能注意到我的错误...
dx=.01;
dy=.01;
x=-3:dx:3;
y=-3:dy:3;
[X,Y]=meshgrid(x,y);
f=X.^2+Y.^2;
lr = .1; %learning rate
eps = 1e-10; %epsilon threshold
tooMuch = 1e5; %limit iterations
p = [.1 1]; %starting point
[~, idx] = min( abs(x-p(1)) ); %index of closest value
[~, idy] = min( abs(y-p(2)) ); %index of closest value
p = [x(idx) y(idy)]; %closest point to start
[xGrad,yGrad] = gradient(f); %partial derivatives of f
xGrad = xGrad/dx; %scale correction
yGrad = yGrad/dy; %scale correction
for i=1:tooMuch %prevents too many iterations
fGrad = [ xGrad(idx,idy) , yGrad(idx,idy) ]; %gradient's definition
pTMP = p(end,:) - lr*fGrad; %gradient descent's core
[~, idx] = min( abs(x-pTMP(1)) ); %index of closest value
[~, idy] = min( abs(y-pTMP(2)) ); %index of closest value
p = [p;x(idx) y(idy)]; %add the new point
if sqrt( sum( (p(end,:)-p(end-1,:)).^2 ) ) < eps %check conversion
break
end
end
感谢所有帮助过的人
编辑:更正了拼写错误并使代码更清晰。它仍然做同样的事情并且有同样的问题
meshgrid 返回的 X 矩阵在列而不是行 中增加了 X 的值!例如[X, Y] = meshgrid(-1:1, 1:3)
returns
[-1 0 1; [1 1 1;
X = -1 0 1; Y = 2 2 2;
= -1 0 1]; 3 3 3];
注意x-index应该放在X或Y的列,y-index应该放在行。具体来说,您的线路:
fGrad = [ xGrad(idx,idy) , yGrad(idx,idy) ]; %gradient's definition
应该改为:
fGrad = [ xGrad(idy,idx) , yGrad(idy,idx) ]; %gradient's definition
idy
变量应该索引 行 ,idx
变量应该索引 列
最终我没有弄清楚之前的方法有什么问题,但是这里有一个梯度下降的替代脚本,我用它来解决同样的问题:
syms x y
f = -20*(x/2-x^2-y^5)*exp(-x^2-y^2); %cost function
% f = x^2+y^2; %simple test function
g = gradient(f, [x, y]);
lr = .01; %learning rate
eps = 1e-10; %convergence threshold
tooMuch = 1e3; %iterations' limit
p = [1.5 -1]; %starting point
for i=1:tooMuch %prevents too many iterations
pGrad = [subs(g(1),[x y],p(end,:)) subs(g(2),[x y],p(end,:))]; %computes gradient
pTMP = p(end,:) - lr*pGrad; %gradient descent's core
p = [p;double(pTMP)]; %adds the new point
if sum( (p(end,:)-p(end-1,:)).^2 ) < eps %checks convergence
break
end
end
v = -3:.1:3; %desired axes
[X, Y] = meshgrid(v,v);
contour(v,v,subs(f,[x y],{X,Y})) %draws the contour lines
hold on
quiver(v,v,subs(g(1), [x y], {X,Y}),subs(g(2), [x y], {X,Y})) %draws the gradient directions
plot(p(:,1),p(:,2)) %draws the route
hold off
suptitle(['gradient descent route from ',mat2str(round(p(1,:),3)),' with \eta=',num2str(lr)])
if i<tooMuch
title(['converged to ',mat2str(round(p(end,:),3)),' after ',mat2str(i),' steps'])
else
title(['stopped at ',mat2str(round(p(end,:),3)),' without converging'])
end
只是部分结果
在后一种情况下你可以看到它没有收敛,但这不是梯度下降的问题,只是学习率设置得太高(所以它错过了最小点重复)。
欢迎使用。
所以我写了下面的MATLAB代码作为梯度下降的练习。我显然选择了一个最小值为 (0,0) 的函数,但算法将我抛向 (-3,3)。
我确实发现在线 xGrad
和 yGrad
之间切换:[xGrad,yGrad] = gradient(f);
可以实现正确的收敛,尽管 xGrad
、yGrad
大约是 2*X
、2*Y
,符合预期。我想我在这里颠倒了一些东西,但我一直试图弄清楚它是什么,但我不明白,所以我希望有人能注意到我的错误...
dx=.01;
dy=.01;
x=-3:dx:3;
y=-3:dy:3;
[X,Y]=meshgrid(x,y);
f=X.^2+Y.^2;
lr = .1; %learning rate
eps = 1e-10; %epsilon threshold
tooMuch = 1e5; %limit iterations
p = [.1 1]; %starting point
[~, idx] = min( abs(x-p(1)) ); %index of closest value
[~, idy] = min( abs(y-p(2)) ); %index of closest value
p = [x(idx) y(idy)]; %closest point to start
[xGrad,yGrad] = gradient(f); %partial derivatives of f
xGrad = xGrad/dx; %scale correction
yGrad = yGrad/dy; %scale correction
for i=1:tooMuch %prevents too many iterations
fGrad = [ xGrad(idx,idy) , yGrad(idx,idy) ]; %gradient's definition
pTMP = p(end,:) - lr*fGrad; %gradient descent's core
[~, idx] = min( abs(x-pTMP(1)) ); %index of closest value
[~, idy] = min( abs(y-pTMP(2)) ); %index of closest value
p = [p;x(idx) y(idy)]; %add the new point
if sqrt( sum( (p(end,:)-p(end-1,:)).^2 ) ) < eps %check conversion
break
end
end
感谢所有帮助过的人
编辑:更正了拼写错误并使代码更清晰。它仍然做同样的事情并且有同样的问题
meshgrid 返回的 X 矩阵在列而不是行 中增加了 X 的值!例如[X, Y] = meshgrid(-1:1, 1:3)
returns
[-1 0 1; [1 1 1;
X = -1 0 1; Y = 2 2 2;
= -1 0 1]; 3 3 3];
注意x-index应该放在X或Y的列,y-index应该放在行。具体来说,您的线路:
fGrad = [ xGrad(idx,idy) , yGrad(idx,idy) ]; %gradient's definition
应该改为:
fGrad = [ xGrad(idy,idx) , yGrad(idy,idx) ]; %gradient's definition
idy
变量应该索引 行 ,idx
变量应该索引 列
最终我没有弄清楚之前的方法有什么问题,但是这里有一个梯度下降的替代脚本,我用它来解决同样的问题:
syms x y
f = -20*(x/2-x^2-y^5)*exp(-x^2-y^2); %cost function
% f = x^2+y^2; %simple test function
g = gradient(f, [x, y]);
lr = .01; %learning rate
eps = 1e-10; %convergence threshold
tooMuch = 1e3; %iterations' limit
p = [1.5 -1]; %starting point
for i=1:tooMuch %prevents too many iterations
pGrad = [subs(g(1),[x y],p(end,:)) subs(g(2),[x y],p(end,:))]; %computes gradient
pTMP = p(end,:) - lr*pGrad; %gradient descent's core
p = [p;double(pTMP)]; %adds the new point
if sum( (p(end,:)-p(end-1,:)).^2 ) < eps %checks convergence
break
end
end
v = -3:.1:3; %desired axes
[X, Y] = meshgrid(v,v);
contour(v,v,subs(f,[x y],{X,Y})) %draws the contour lines
hold on
quiver(v,v,subs(g(1), [x y], {X,Y}),subs(g(2), [x y], {X,Y})) %draws the gradient directions
plot(p(:,1),p(:,2)) %draws the route
hold off
suptitle(['gradient descent route from ',mat2str(round(p(1,:),3)),' with \eta=',num2str(lr)])
if i<tooMuch
title(['converged to ',mat2str(round(p(end,:),3)),' after ',mat2str(i),' steps'])
else
title(['stopped at ',mat2str(round(p(end,:),3)),' without converging'])
end
只是部分结果
在后一种情况下你可以看到它没有收敛,但这不是梯度下降的问题,只是学习率设置得太高(所以它错过了最小点重复)。
欢迎使用。