多变量梯度下降 Matlab - 两个代码有什么区别？

Question

以下函数使用梯度下降找到回归线的最佳 "thetas"。输入 (X,y) 附在下面。我的问题是代码 1 和代码 2 之间有什么区别？为什么代码 2 有效而代码 1 无效？

提前致谢！

GRADIENTDESCENTMULTI 执行梯度下降来学习 theta，它通过采用 num_iters 学习率 alpha

的梯度步骤来更新 theta

function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters)
% Initialize some useful values
m = length(y); % number of training examples
n = length(theta);
J_history = zeros(num_iters, 1);
costs = zeros(n,1);

for iter = 1:num_iters
    % code 1 - doesn't work 
    for c = 1:n
        for i = 1:m
            costs(c) = costs(c)+(X(i,:)*theta - y(i))*X(i,c);
        end  
    end

    % code 2 - does work
    E = X * theta - y;
    for c = 1:n
        costs(c) = sum(E.*X(:,c));
    end

    % update each theta
    for c = 1:n
        theta(c) = theta(c) - alpha*costs(c)/m;
    end
    J_history(iter) = computeCostMulti(X, y, theta);    
end
end

function J = computeCostMulti(X, y, theta)

for i=1:m
    J = J+(X(i,:)*theta - y(i))^2;
end
J = J/(2*m);

要运行代码：

alpha = 0.01;
num_iters = 200; 

% Init Theta and Run Gradient Descent 
theta = zeros(3, 1);
[theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters);

% Plot the convergence graph
figure;
plot(1:numel(J_history), J_history, '-b', 'LineWidth', 2);
xlabel('Number of iterations');
ylabel('Cost J');

% Display gradient descent's result
fprintf('Theta computed from gradient descent: \n');
fprintf(' %f \n', theta);
fprintf('\n');

X 是

1.0000    0.1300   -0.2237
1.0000   -0.5042   -0.2237
1.0000    0.5025   -0.2237
1.0000   -0.7357   -1.5378
1.0000    1.2575    1.0904
1.0000   -0.0197    1.0904
1.0000   -0.5872   -0.2237
1.0000   -0.7219   -0.2237
1.0000   -0.7810   -0.2237
1.0000   -0.6376   -0.2237
1.0000   -0.0764    1.0904
1.0000   -0.0009   -0.2237
1.0000   -0.1393   -0.2237
1.0000    3.1173    2.4045
1.0000   -0.9220   -0.2237
1.0000    0.3766    1.0904
1.0000   -0.8565   -1.5378
1.0000   -0.9622   -0.2237
1.0000    0.7655    1.0904
1.0000    1.2965    1.0904
1.0000   -0.2940   -0.2237
1.0000   -0.1418   -1.5378
1.0000   -0.4992   -0.2237
1.0000   -0.0487    1.0904
1.0000    2.3774   -0.2237
1.0000   -1.1334   -0.2237
1.0000   -0.6829   -0.2237
1.0000    0.6610   -0.2237
1.0000    0.2508   -0.2237
1.0000    0.8007   -0.2237
1.0000   -0.2034   -1.5378
1.0000   -1.2592   -2.8519
1.0000    0.0495    1.0904
1.0000    1.4299   -0.2237
1.0000   -0.2387    1.0904
1.0000   -0.7093   -0.2237
1.0000   -0.9584   -0.2237
1.0000    0.1652    1.0904
1.0000    2.7864    1.0904
1.0000    0.2030    1.0904
1.0000   -0.4237   -1.5378
1.0000    0.2986   -0.2237
1.0000    0.7126    1.0904
1.0000   -1.0075   -0.2237
1.0000   -1.4454   -1.5378
1.0000   -0.1871    1.0904
1.0000   -1.0037   -0.2237

Y 是

Answer 1

我想我的工作正常。最主要的是，在代码 1 中，您不断地向 cost(c) 添加内容，但在下一次迭代之前从未将其设置为零。唯一真正需要更改的是在 for c = 1:n 之后和 for i = 1:m 之前添加类似 cost(c) = 0; 的内容。我确实需要稍微更改您的代码才能让它为我工作（主要是 computeCostMulti），并且我更改了图表以显示两种方法的结果相同。总的来说，这是一段带有这些更改的工作演示代码

close all; clear; clc;

%% Data
X = [1.0000  0.1300 -0.2237; 1.0000 -0.5042 -0.2237; 1.0000  0.5025 -0.2237; 1.0000 -0.7357 -1.5378;
    1.0000  1.2575  1.0904; 1.0000 -0.0197  1.0904; 1.0000 -0.5872 -0.2237; 1.0000 -0.7219 -0.2237;
    1.0000 -0.7810 -0.2237; 1.0000 -0.6376 -0.2237; 1.0000 -0.0764  1.0904; 1.0000 -0.0009 -0.2237;
    1.0000 -0.1393 -0.2237; 1.0000  3.1173  2.4045; 1.0000 -0.9220 -0.2237; 1.0000  0.3766  1.0904;
    1.0000 -0.8565 -1.5378; 1.0000 -0.9622 -0.2237; 1.0000  0.7655  1.0904; 1.0000  1.2965  1.0904;
    1.0000 -0.2940 -0.2237; 1.0000 -0.1418 -1.5378; 1.0000 -0.4992 -0.2237; 1.0000 -0.0487  1.0904;
    1.0000  2.3774 -0.2237; 1.0000 -1.1334 -0.2237; 1.0000 -0.6829 -0.2237; 1.0000  0.6610 -0.2237;
    1.0000  0.2508 -0.2237; 1.0000  0.8007 -0.2237; 1.0000 -0.2034 -1.5378; 1.0000 -1.2592 -2.8519;
    1.0000  0.0495  1.0904; 1.0000  1.4299 -0.2237; 1.0000 -0.2387  1.0904; 1.0000 -0.7093 -0.2237;
    1.0000 -0.9584 -0.2237; 1.0000  0.1652  1.0904; 1.0000  2.7864  1.0904; 1.0000  0.2030  1.0904;
    1.0000 -0.4237 -1.5378; 1.0000  0.2986 -0.2237; 1.0000  0.7126  1.0904; 1.0000 -1.0075 -0.2237;
    1.0000 -1.4454 -1.5378; 1.0000 -0.1871  1.0904; 1.0000 -1.0037 -0.2237];
y = [399900 329900 369000 232000 539900 299900 314900 198999 212000 242500 239999 347000 329999,...
    699900 259900 449900 299900 199900 499998 599000 252900 255000 242900 259900 573900 249900,...
    464500 469000 475000 299900 349900 169900 314900 579900 285900 249900 229900 345000 549000,...
    287000 368500 329900 314000 299000 179900 299900 239500]';

alpha = 0.01;
num_iters = 200;

% Init Theta and Run Gradient Descent
theta0 = zeros(3, 1);
[theta_result_1, J_history_1] = gradientDescentMulti(X, y, theta0, alpha, num_iters, 1);
[theta_result_2, J_history_2] = gradientDescentMulti(X, y, theta0, alpha, num_iters, 2);

% Plot the convergence graph for both methods
figure;
x = 1:numel(J_history_1);
subplot(5,1,1:4);
plot(x,J_history_1,x,J_history_2);
xlim([min(x) max(x)]);
set(gca,'XTickLabel','');
ylabel('Cost J');
grid on;

subplot(5,1,5);
stem(x,(J_history_1-J_history_2)./J_history_1,'ko');
xlim([min(x) max(x)]);
xlabel('Number of iterations');
ylabel('frac. \DeltaJ');
grid on;

% Display gradient descent's result
fprintf('Theta computed from gradient descent with method 1: \n');
fprintf(' %f \n', theta_result_1);
fprintf('Theta computed from gradient descent with method 2: \n');
fprintf(' %f \n', theta_result_2);
fprintf('\n');

function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters, METHOD)
% Initialize some useful values
m = length(y); % number of training examples
n = length(theta);
J_history = zeros(num_iters, 1);

costs = zeros(n,1);
for iter = 1:num_iters

    if METHOD == 1 % code 1 - does work
        for c = 1:n
            costs(c) = 0;
            for i = 1:m
                costs(c) = costs(c) + (X(i,:)*theta - y(i)) *X(i,c);
            end
        end
    elseif METHOD == 2 % code 2 - does work
        E = X * theta - y;
        for c = 1:n
            costs(c) = sum(E.*X(:,c));
        end
    else
        error('unknown method');
    end

    % update each theta
    for c = 1:n
        theta(c) = theta(c) - alpha*costs(c)/m;
    end
    J_history(iter) = computeCostMulti(X, y, theta);
end
end

function J = computeCostMulti(X, y, theta)
m = length(y); J = 0;
for mi = 1:m
    J = J + (X(mi,:)*theta - y(mi))^2;
end
J = J/(2*m);
end

但是，同样，您真的只需要添加 cost(c) = 0; 行。

还有;我建议始终在脚本的开头添加 close all; clear; clc; 行，以确保在将它们复制并粘贴到堆栈溢出时它们能够正常工作。

多变量梯度下降 Matlab - 两个代码有什么区别？

Multivariable gradient descent Matlab - what is the difference between the two codes?

matlab

machine-learning

octave

gradient-descent