没有矩阵的多变量梯度下降
Gradient Descent with multiple variable without Matrix
我是 Matlab 和机器学习的新手,我尝试在不使用矩阵的情况下制作梯度下降函数。
- m是我训练集上的样本数
- n是每个例子的特征数
函数 gradientDescentMulti 有 5 个参数:
- X mxn 矩阵
- y m维向量
- theta : n维向量
- alpha : 实数
- nb_iters : 实数
我已经有了使用矩阵乘法的解决方案
function theta = gradientDescentMulti(X, y, theta, alpha, num_iters)
for iter = 1:num_iters
gradJ = 1/m * (X'*X*theta - X'*y);
theta = theta - alpha * gradJ;
end
end
迭代后的结果:
theta =
1.0e+05 *
3.3430
1.0009
0.0367
但是现在,我尝试在没有矩阵乘法的情况下做同样的事情,这是函数:
function theta = gradientDescentMulti(X, y, theta, alpha, num_iters)
m = length(y); % number of training examples
n = size(X, 2); % number of features
for iter = 1:num_iters
new_theta = zeros(1, n);
%// for each feature, found the new theta
for t = 1:n
S = 0;
for example = 1:m
h = 0;
for example_feature = 1:n
h = h + (theta(example_feature) * X(example, example_feature));
end
S = S + ((h - y(example)) * X(example, n)); %// Sum each feature for this example
end
new_theta(t) = theta(t) - alpha * (1/m) * S; %// Calculate new theta for this example
end
%// only at the end of the function, update all theta simultaneously
theta = new_theta'; %// Transpose new_theta (horizontal vector) to theta (vertical vector)
end
end
结果,所有的theta都一样:/
theta =
1.0e+04 *
3.5374
3.5374
3.5374
如果您查看梯度更新规则,首先实际计算所有训练示例的假设可能更有效,然后将其与每个训练示例的真实值相减并将它们存储到数组中或矢量。完成此操作后,您就可以非常轻松地计算更新规则。对我来说,您似乎没有在代码中这样做。
因此,我重写了代码,但我有一个单独的数组,用于存储每个训练示例和真实值的假设差异。执行此操作后,我将分别计算每个功能的更新规则:
for iter = 1 : num_iters
%// Compute hypothesis differences with ground truth first
h = zeros(1, m);
for t = 1 : m
%// Compute hypothesis
for tt = 1 : n
h(t) = h(t) + theta(tt)*X(t,tt);
end
%// Compute difference between hypothesis and ground truth
h(t) = h(t) - y(t);
end
%// Now update parameters
new_theta = zeros(1, n);
%// for each feature, find the new theta
for tt = 1 : n
S = 0;
%// For each sample, compute products of hypothesis difference
%// and the right feature of the sample and accumulate
for t = 1 : m
S = S + h(t)*X(t,tt);
end
%// Compute gradient descent step
new_theta(tt) = theta(tt) - (alpha/m)*S;
end
theta = new_theta'; %// Transpose new_theta (horizontal vector) to theta (vertical vector)
end
当我这样做时,我得到了与使用矩阵公式相同的答案。
我是 Matlab 和机器学习的新手,我尝试在不使用矩阵的情况下制作梯度下降函数。
- m是我训练集上的样本数
- n是每个例子的特征数
函数 gradientDescentMulti 有 5 个参数:
- X mxn 矩阵
- y m维向量
- theta : n维向量
- alpha : 实数
- nb_iters : 实数
我已经有了使用矩阵乘法的解决方案
function theta = gradientDescentMulti(X, y, theta, alpha, num_iters)
for iter = 1:num_iters
gradJ = 1/m * (X'*X*theta - X'*y);
theta = theta - alpha * gradJ;
end
end
迭代后的结果:
theta =
1.0e+05 *
3.3430
1.0009
0.0367
但是现在,我尝试在没有矩阵乘法的情况下做同样的事情,这是函数:
function theta = gradientDescentMulti(X, y, theta, alpha, num_iters)
m = length(y); % number of training examples
n = size(X, 2); % number of features
for iter = 1:num_iters
new_theta = zeros(1, n);
%// for each feature, found the new theta
for t = 1:n
S = 0;
for example = 1:m
h = 0;
for example_feature = 1:n
h = h + (theta(example_feature) * X(example, example_feature));
end
S = S + ((h - y(example)) * X(example, n)); %// Sum each feature for this example
end
new_theta(t) = theta(t) - alpha * (1/m) * S; %// Calculate new theta for this example
end
%// only at the end of the function, update all theta simultaneously
theta = new_theta'; %// Transpose new_theta (horizontal vector) to theta (vertical vector)
end
end
结果,所有的theta都一样:/
theta =
1.0e+04 *
3.5374
3.5374
3.5374
如果您查看梯度更新规则,首先实际计算所有训练示例的假设可能更有效,然后将其与每个训练示例的真实值相减并将它们存储到数组中或矢量。完成此操作后,您就可以非常轻松地计算更新规则。对我来说,您似乎没有在代码中这样做。
因此,我重写了代码,但我有一个单独的数组,用于存储每个训练示例和真实值的假设差异。执行此操作后,我将分别计算每个功能的更新规则:
for iter = 1 : num_iters
%// Compute hypothesis differences with ground truth first
h = zeros(1, m);
for t = 1 : m
%// Compute hypothesis
for tt = 1 : n
h(t) = h(t) + theta(tt)*X(t,tt);
end
%// Compute difference between hypothesis and ground truth
h(t) = h(t) - y(t);
end
%// Now update parameters
new_theta = zeros(1, n);
%// for each feature, find the new theta
for tt = 1 : n
S = 0;
%// For each sample, compute products of hypothesis difference
%// and the right feature of the sample and accumulate
for t = 1 : m
S = S + h(t)*X(t,tt);
end
%// Compute gradient descent step
new_theta(tt) = theta(tt) - (alpha/m)*S;
end
theta = new_theta'; %// Transpose new_theta (horizontal vector) to theta (vertical vector)
end
当我这样做时,我得到了与使用矩阵公式相同的答案。