如何从 Octave 中的 Andrew Ng 作业中编写成本函数公式？

Question

我的实现（见下文）给出了标量值 3.18，这不是正确答案。该值应为 0.693。我的代码哪里偏离了等式？

Octave 中运行成本函数方法的数据求解说明如下：

data = load('ex2data1.txt');
X = data(:, [1, 2]); y = data(:, 3);
[m, n] = size(X);
X = [ones(m, 1) X];
initial_theta = zeros(n + 1, 1);
[cost, grad] = costFunction(initial_theta, X, y);

这里是 ex2data 上的 link，这个包里有数据：data link.

成本函数的公式是

这是我使用的代码：

function [J, grad] = costFunction(theta, X, y)

m = length(y); % number of training examples

% You need to return the following variables correctly 
J = 0; %#ok<NASGU>
grad = zeros(size(theta)); %#ok<NASGU>

hx = sigmoid(X * theta)';
m = length(X);

J = sum(-y' * log(hx) - (1 - y')*log(1 - hx)) / m;

grad = X' * (hx - y) / m;

end

这里是 sigmoid 函数：

function g = sigmoid(z)
g = 1/(1+exp(-z));
end

Answer 1

您的 sigmoid 函数不正确。传入的数据类型是向量，但您正在使用的操作正在执行矩阵除法。这需要 element-wise.

function g = sigmoid(z)
    g = 1.0 ./ (1.0 + exp(-z));
end

通过执行 1 / A 其中 A 是一个表达式，您实际上是在计算 A 的 inverse 因为逆仅存在于方矩阵，这将计算 pseudo-inverse，这绝对不是您想要的。

您可以保持大部分 costFunction 代码与使用点积相同。我会摆脱 sum 因为点积暗示了这一点。我会用评论标记我的更改：

function [J, grad] = costFunction(theta, X, y)

m = length(y); % number of training examples

% You need to return the following variables correctly 
%J = 0; %#ok<NASGU> <-- Don't need to declare this as you'll create the variables later
%grad = zeros(size(theta)); %#ok<NASGU>

hx = sigmoid(X * theta);  % <-- Remove transpose
m = length(X);

J = (-y' * log(hx) - (1 - y')*log(1 - hx)) / m; % <-- Remove sum

grad = X' * (hx - y) / m;

end

Answer 2

这是 sigmoid 函数的代码，我认为你在其中犯了错误：

function g = sigmoid(z)
   g = zeros(size(z));
   temp=1+exp(-1.*z);
   g=1./temp;
end


function [J, grad] = costFunction(theta, X, y)
   m = length(y); 
   J = 0;
   grad = zeros(size(theta));
   h=X*theta;
   xtemp=sigmoid(h);
   temp1=(-y'*log(xtemp));
   temp2=(1-y)'*log(1-xtemp);
   J=1/m*sum(temp1-temp2);
   grad=1/m*(X'*(xtemp-y));
end

而且我认为应该是 (1-y)'，如 temp2=(1-y)'

如何从 Octave 中的 Andrew Ng 作业中编写成本函数公式？

How to write cost function formula from Andrew Ng assignment in Octave?

machine-learning

octave

gradient-descent

logistic-regression