为什么我为 Andrew Ng 写的课程没有被接受？

Question

Andrew Ng 在 Coursera 中的课程，这是斯坦福大学的机器学习课程，其特色是编程作业，这些作业涉及实现 class 中教授的算法。此作业的目标是通过梯度下降实现线性回归，输入集为 X, y, theta, alpha (learning rate), and number of iterations。

我在 Octave 中实现了这个解决方案，这是课程中规定的语言。

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)

m = length(y); 
J_history = zeros(num_iters, 1);

numJ = size(theta, 1);

for iter = 1:num_iters

    for i = 1:m

        for j = 1:numJ

            temp = theta(j) - alpha /m *  X(i, j) * (((X * theta)(i, 1)) - y(i, 1));

            theta(j) = temp

        end

        prediction = X * theta;

J_history(iter, 1) = computeCost(X,y,theta) 

 end   

end

另一方面，这是成本函数：

function J = computeCost(X, y, theta)

m = length(y); 

J = 0;

prediction = X * theta;

error = (prediction - y).^2;

J = 1/(2 * m) .* sum(error);

end

这没有通过submit()函数。 submit() 函数只是通过传递一个未知的测试用例来验证数据。

我在 Whosebug 上查看了其他问题，但我真的不明白。 :)

非常感谢！

Answer 1

你的计算成本代码是正确的更好地遵循梯度下降的矢量化实现。你只是在迭代，它很慢并且可能有错误。

该课程旨在让您进行矢量化实施，因为它既简单又方便。我知道这一点，因为我是在流了很多汗之后才这么做的。矢量化很好:)

Answer 2

你的梯度似乎是正确的，正如@Kasinath P 给出的答案中已经指出的那样，问题很可能是代码太慢了。您只需要对其进行矢量化。在Matlab/Octave中，你通常需要避免for循环（注意虽然你在Matlab中有parfor，但在octave中还没有）。因此，就性能而言，编写类似 A*x 的内容总是更好，而不是使用 for 循环遍历 A 的每一行。您可以阅读有关矢量化的内容 here。

如果我理解正确，X 是大小为 m*numJ 的矩阵，其中 m 是示例数，numJ 是特征数（或每个点所在的 space 的维度。在这种情况下，您可以将成本函数重写为

(1/(2*m)) * (X*theta-y)'*(X*theta-y);%since ||v||_2^2=v'*v for any vector v in Euclidean space

现在，我们从基本的 matrix calculus 中知道，对于作为从 R^{num_J} 到 R^m 的函数的任意两个向量 s 和 v，雅可比矩阵s^{t}v 由

给出

s^{t}Jacobian(v)+v^{t}*Jacobian(s) %this Jacobian will have size 1*num_J.

将其应用于您的成本函数，我们得到

jacobian=(1/m)*(theta'*X'-y')*X;

所以如果你只是替换

for i = 1:m
    for j = 1:numJ
        %%% theta(j) updates
    end
end

和

%note that the gradient is the transpose of the Jacobian we've computed 
theta-=alpha*(1/m)*X'*(X*theta-y)

您应该会看到性能有了很大的提高。

为什么我为 Andrew Ng 写的课程没有被接受？

Why that I have written for Andrew Ng's course not accepted?

octave

gradient-descent