Matlab 正则化逻辑回归 - 如何计算梯度
Matlab Regularized Logistic Regression - how to compute gradient
我目前正在 Coursera 平台上学习机器学习,我正在尝试实施逻辑回归。为了实现逻辑回归,我使用梯度下降来最小化成本函数,我将编写一个名为 costFunctionReg.m
的函数,该函数 returns 在当前参数集下评估的每个参数的成本和梯度.
问题描述如下:
我的成本函数有效,但梯度函数无效。请注意,我更愿意使用循环而不是逐个元素的操作来实现它。
我正在单独计算 theta[0]
(在 MATLAB 中,theta(1)
),因为它没有被正则化,即我们不使用第一项(lambda
)。
function [J, grad] = costFunctionReg(theta, X, y, lambda)
%COSTFUNCTIONREG Compute cost and gradient for logistic regression with regularization
% J = COSTFUNCTIONREG(theta, X, y, lambda) computes the cost of using
% theta as the parameter for regularized logistic regression and the
% gradient of the cost w.r.t. to the parameters.
% Initialize some useful values
m = length(y); % number of training examples
n = length(theta); %number of parameters (features)
% You need to return the following variables correctly
J = 0;
grad = zeros(size(theta));
% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
% You should set J to the cost.
% Compute the partial derivatives and set grad to the partial
% derivatives of the cost w.r.t. each parameter in theta
% ----------------------1. Compute the cost-------------------
%hypothesis
h = sigmoid(X * theta);
for i = 1 : m
% The cost for the ith term before regularization
J = J - ( y(i) * log(h(i)) ) - ( (1 - y(i)) * log(1 - h(i)) );
% Adding regularization term
for j = 2 : n
J = J + (lambda / (2*m) ) * ( theta(j) )^2;
end
end
J = J/m;
% ----------------------2. Compute the gradients-------------------
%not regularizing theta[0] i.e. theta(1) in matlab
j = 1;
for i = 1 : m
grad(j) = grad(j) + ( h(i) - y(i) ) * X(i,j);
end
for j = 2 : n
for i = 1 : m
grad(j) = grad(j) + ( h(i) - y(i) ) * X(i,j) + lambda * theta(j);
end
end
grad = (1/m) * grad;
% =============================================================
end
我做错了什么?
您应用正则化的方式不正确。您在对所有训练示例求和后 添加正则化 ,但在每个示例 之后添加正则化 。如果您在更正之前保留代码,您会无意中使梯度步长变大,最终会超出解决方案。这种超调会累积,并且不可避免地会为所有分量(偏差项除外)提供 Inf
或 -Inf
的梯度向量。
简单地说,将您的 lambda*theta(j)
语句放在第二个 for
循环终止之后:
for j = 2 : n
for i = 1 : m
grad(j) = grad(j) + ( h(i) - y(i) ) * X(i,j); % Change
end
grad(j) = grad(j) + lambda * theta(j); % Change
end
我目前正在 Coursera 平台上学习机器学习,我正在尝试实施逻辑回归。为了实现逻辑回归,我使用梯度下降来最小化成本函数,我将编写一个名为 costFunctionReg.m
的函数,该函数 returns 在当前参数集下评估的每个参数的成本和梯度.
问题描述如下:
我的成本函数有效,但梯度函数无效。请注意,我更愿意使用循环而不是逐个元素的操作来实现它。
我正在单独计算 theta[0]
(在 MATLAB 中,theta(1)
),因为它没有被正则化,即我们不使用第一项(lambda
)。
function [J, grad] = costFunctionReg(theta, X, y, lambda)
%COSTFUNCTIONREG Compute cost and gradient for logistic regression with regularization
% J = COSTFUNCTIONREG(theta, X, y, lambda) computes the cost of using
% theta as the parameter for regularized logistic regression and the
% gradient of the cost w.r.t. to the parameters.
% Initialize some useful values
m = length(y); % number of training examples
n = length(theta); %number of parameters (features)
% You need to return the following variables correctly
J = 0;
grad = zeros(size(theta));
% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
% You should set J to the cost.
% Compute the partial derivatives and set grad to the partial
% derivatives of the cost w.r.t. each parameter in theta
% ----------------------1. Compute the cost-------------------
%hypothesis
h = sigmoid(X * theta);
for i = 1 : m
% The cost for the ith term before regularization
J = J - ( y(i) * log(h(i)) ) - ( (1 - y(i)) * log(1 - h(i)) );
% Adding regularization term
for j = 2 : n
J = J + (lambda / (2*m) ) * ( theta(j) )^2;
end
end
J = J/m;
% ----------------------2. Compute the gradients-------------------
%not regularizing theta[0] i.e. theta(1) in matlab
j = 1;
for i = 1 : m
grad(j) = grad(j) + ( h(i) - y(i) ) * X(i,j);
end
for j = 2 : n
for i = 1 : m
grad(j) = grad(j) + ( h(i) - y(i) ) * X(i,j) + lambda * theta(j);
end
end
grad = (1/m) * grad;
% =============================================================
end
我做错了什么?
您应用正则化的方式不正确。您在对所有训练示例求和后 添加正则化 ,但在每个示例 之后添加正则化 。如果您在更正之前保留代码,您会无意中使梯度步长变大,最终会超出解决方案。这种超调会累积,并且不可避免地会为所有分量(偏差项除外)提供 Inf
或 -Inf
的梯度向量。
简单地说,将您的 lambda*theta(j)
语句放在第二个 for
循环终止之后:
for j = 2 : n
for i = 1 : m
grad(j) = grad(j) + ( h(i) - y(i) ) * X(i,j); % Change
end
grad(j) = grad(j) + lambda * theta(j); % Change
end