梯度函数无法找到最优的 theta 但正规方程可以

Question

我尝试用一些样本数据实现我自己的线性回归模型 in octave 但 theta 似乎不正确并且确实如此与给出正确的 theta 值的正规方程所提供的不匹配。 但是运行我的模型（具有不同的 alpha 和迭代）基于 Andrew Ng 的机器学习课程的数据给出了假设的正确 theta。 我已经调整了 alpha 和迭代，以便成本函数减小。 This is the image of cost function against iterations.。 如您所见，成本有所下降并趋于稳定，但还没有达到足够低的成本。有人可以帮助我理解为什么会发生这种情况以及我可以采取什么措施来解决它吗？

这是数据（第一列是x值，第二列是y值）：

20,48
40,55.1
60,56.3
80,61.2
100,68

Here is the graph of the data and the equations plotted by gradient descent(GD) and by the normal equation(NE).

主脚本代码：

clear , close all, clc;

%loading the data
data = load("data1.txt");

X = data(:,1);
y = data(:,2);

%Plotting the data
figure
plot(X,y, 'xr', 'markersize', 7);
xlabel("Mass in kg");
ylabel("Length in cm");

X = [ones(length(y),1), X];
theta = ones(2, 1);


alpha = 0.000001; num_iter = 4000;
%Running gradientDescent
[opt_theta, J_history] = gradientDescent(X, y, theta, alpha, num_iter);

%Running Normal equation
opt_theta_norm = pinv(X' * X) * X' * y;

%Plotting the hypothesis for GD and NE
hold on
plot(X(:,2), X * opt_theta);
plot(X(:,2), X * opt_theta_norm, 'g-.', "markersize",10);
legend("Data", "GD", "NE");
hold off

%Plotting values of previous J with each iteration
figure
plot(1:numel(J_history), J_history);
xlabel("iterations"); ylabel("J");

寻找梯度下降的函数：

function [theta, J_history] = gradientDescent (X, y, theta, alpha, num_iter)

m = length(y);
J_history = zeros(num_iter,1);
for iter = 1:num_iter
  theta = theta - (alpha / m) * (X' * (X * theta - y));
  J_history(iter) = computeCost(X, y, theta);
endfor
endfunction

计算成本的函数：

function J = computeCost (X, y, theta)
  
J = 0;
m = length(y);
errors = X * theta - y;
J = sum(errors .^ 2) / (2 * m);

endfunction

Answer 1

试试 alpha = 0.0001 和 num_iter = 400000。这将解决您的问题！

现在，您的代码的问题是学习率太低，这会减慢收敛速度。此外，您没有通过将训练迭代限制为 4000 来给它足够的时间来收敛，这在给定学习率的情况下非常少。

总结一下，问题是：更少的学习率 + 更少的迭代。

梯度函数无法找到最优的 theta 但正规方程可以

Gradient function not able to find optimal theta but normal equation does

machine-learning

octave

linear-regression