Matlab R2012b 中矩阵乘法的精度问题
Accuracy issues with multiplication of matrices in Matlab R2012b
我已经实现了一个脚本,该脚本执行约束优化以求解支持向量机模型的最佳参数。我注意到我的脚本出于某种原因给出了不准确的结果(尽管非常接近真实值)。例如,典型的情况是计算结果应该恰好为 0,但它却像
-1/18014398509481984 = -5.551115123125783e-17
当我将矩阵与向量相乘时会出现这种情况。让这也很奇怪的是,如果我在 Matlab 中的命令 window 中手动进行乘法运算,我得到的结果恰好为 0。
让我举个例子:如果我使用向量 Aq = [-1 -1 1 1]
和 x = [12/65 28/65 32/65 8/65]'
如果我在命令 window 中执行此操作,那么它们的乘法结果恰好为 0,正如你可以在下图中看到:
另一方面,如果我在我的函数脚本中这样做,我得到的结果不是 0,而是值 -1/18014398509481984。
这是我的脚本中负责此乘法的部分(我已将 Aq
和 x
添加到脚本中以显示 Aq
和 x
):
disp('DOT PRODUCT OF ACTIVE SET AND NEW POINT: ')
Aq
x
Aq*x
这是上面代码在 运行 时的结果:
如您所见,该值并不完全为 0,尽管它确实应该是 0。请注意,对于 Aq
和 x
的所有可能值,不会出现此问题。如果 Aq = [-1 -1 1 1]
和 x = [4/13 4/13 4/13 4/13]
结果恰好为 0,如下所示:
是什么导致了这种不准确?我怎样才能解决这个问题?
P.S。我没有包括我的整个代码,因为它没有很好的文档记录并且只有几百行,但如果需要的话我会的。
谢谢!
更新:新测试,使用 Ander Biguri 的建议:
更新 2:代码
function [weights, alphas, iters] = solveSVM(data, labels, C, e)
% FUNCTION [weights, alphas, iters] = solveSVM(data, labels, C, e)
%
% AUTHOR: jjepsuomi
%
% VERSION: 1.0
%
% DESCRIPTION:
% - This function will attempt to solve the optimal weights for a Support
% Vector Machines (SVM) model using active set method with gradient
% projection.
%
% INPUTS:
% "data" a n-by-m data matrix. The number of rows 'n' corresponds to the
% number of data points and the number of columns 'm' corresponds to the
% number of variables.
% "labels" a 1-by-n row vector of data labels from the set {-1,1}.
% "C" Box costraint upper limit. This will constrain the values of 'alphas'
% to the range 0 <= alphas <= C. If hard-margin SVM model is required set
% C=Inf.
% "e" a real value corresponding to the convergence criterion, that is if
% solution Xi and Xi-1 are within distance 'e' from each other stop the
% learning process, i.e. IF |F(Xi)-F(Xi-1)| < e ==> stop learning process.
%
% OUTPUTS:
% "weights" a vector corresponding to the optimal decision line parameters.
% "alphas" a vector of alpha-values corresponding to the optimal solution
% of the dual optimization problem of SVM.
% "iters" number of iterations until learning stopped.
%
% EXAMPLE USAGE 1:
%
% 'Hard-margin SVM':
%
% data = [0 0;2 2;2 0;3 0];
% labels = [-1 -1 1 1];
% [weights, alphas, iters] = solveSVM(data, labels, Inf, 10^-100)
%
% EXAMPLE USAGE 2:
%
% 'Soft-margin SVM':
%
% data = [0 0;2 2;2 0;3 0];
% labels = [-1 -1 1 1];
% [weights, alphas, iters] = solveSVM(data, labels, 0.8, 10^-100)
% STEP 1: INITIALIZATION OF THE PROBLEM
format long
% Calculate linear kernel matrix
L = kron(labels', labels);
K = data*data';
% Hessian matrix
Qd = L.*K;
% The minimization function
L = @(a) (1/2)*a'*Qd*a - ones(1, length(a))*a;
% Gradient of the minimizable function
gL = @(a) a'*Qd - ones(1, length(a));
% STEP 2: THE LEARNING PROCESS, ACTIVE SET WITH GRADIENT PROJECTION
% Initial feasible solution (required by gradient projection)
x = zeros(length(labels), 1);
iters = 1;
optfound = 0;
while optfound == 0 % criterion met
% Negative of the gradient at initial solution
g = -gL(x);
% Set the active set and projection matrix
Aq = labels; % In plane y^Tx = 0
P = eye(length(x))-Aq'*inv(Aq*Aq')*Aq; % In plane projection
% Values smaller than 'eps' are changed into 0
P(find(abs(P-0) < eps)) = 0;
d = P*g'; % Projection onto plane
if ~isempty(find(x==0 | x==C)) % Constraints active?
acinds = find(x==0 | x==C);
for i = 1:length(acinds)
if (x(acinds(i)) == 0 && d(acinds(i)) < 0) || x(acinds(i)) == C && d(acinds(i)) > 0
% Make the constraint vector
constr = zeros(1,length(x));
constr(acinds(i)) = 1;
Aq = [Aq; constr];
end
end
% Update the projection matrix
P = eye(length(x))-Aq'*inv(Aq*Aq')*Aq; % In plane / box projection
% Values smaller than 'eps' are changed into 0
P(find(abs(P-0) < eps)) = 0;
d = P*g'; % Projection onto plane / border
end
%%%% DISPLAY INFORMATION, THIS PART IS NOT NECESSAY, ONLY FOR DEBUGGING
if Aq*x ~= 0
disp('ACTIVE SET CONSTRAINTS Aq :')
Aq
disp('CURRENT SOLUTION x :')
x
disp('MULTIPLICATION OF Aq and x')
Aq*x
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Values smaller than 'eps' are changed into 0
d(find(abs(d-0) < eps)) = 0;
if ~isempty(find(d~=0)) && rank(P) < length(x) % Line search for optimal lambda
lopt = ((g*d)/(d'*Qd*d));
lmax = inf;
for i = 1:length(x)
if d(i) < 0 && -x(i) ~= 0 && -x(i)/d(i) <= lmax
lmax = -x(i)/d(i);
elseif d(i) > 0 && (C-x(i))/d(i) <= lmax
lmax = (C-x(i))/d(i);
end
end
lambda = max(0, min([lopt, lmax]));
if abs(lambda) < eps
lambda = 0;
end
xo = x;
x = x + lambda*d;
iters = iters + 1;
end
% Check whether search direction is 0-vector or 'e'-criterion met.
if isempty(find(d~=0)) || abs(L(x)-L(xo)) < e
optfound = 1;
end
end
%%% STEP 3: GET THE WEIGHTS
alphas = x;
w = zeros(1, length(data(1,:)));
for i = 1:size(data,1)
w = w + labels(i)*alphas(i)*data(i,:);
end
svinds = find(alphas>0);
svind = svinds(1);
b = 1/labels(svind) - w*data(svind, :)';
%%% STEP 4: OPTIMALITY CHECK, KKT conditions. See KKT-conditions for reference.
weights = [b; w'];
datadim = length(data(1,:));
Q = [zeros(1,datadim+1); zeros(datadim, 1), eye(datadim)];
A = [ones(size(data,1), 1), data];
for i = 1:length(labels)
A(i,:) = A(i,:)*labels(i);
end
LagDuG = Q*weights - A'*alphas;
Ac = A*weights - ones(length(labels),1);
alpA = alphas.*Ac;
LagDuG(any(abs(LagDuG-0) < 10^-14)) = 0;
if ~any(alphas < 0) && all(LagDuG == zeros(datadim+1,1)) && all(abs(Ac) >= 0) && all(abs(alpA) < 10^-6)
disp('Optimal found, Karush-Kuhn-Tucker conditions satisfied.')
else
disp('Optimal not found, Karush-Kuhn-Tucker conditions not satisfied.')
end
% VISUALIZATION FOR 2D-CASE
if size(data, 2) == 2
pinds = find(labels > 0);
ninds = find(labels < 0);
plot(data(pinds, 1), data(pinds, 2), 'o', 'MarkerFaceColor', 'red', 'MarkerEdgeColor', 'black')
hold on
plot(data(ninds, 1), data(ninds, 2), 'o', 'MarkerFaceColor', 'blue', 'MarkerEdgeColor', 'black')
Xb = min(data(:,1))-1;
Xe = max(data(:,1))+1;
Yb = -(b+w(1)*Xb)/w(2);
Ye = -(b+w(1)*Xe)/w(2);
lineh = plot([Xb Xe], [Yb Ye], 'LineWidth', 2);
supvh = plot(data(find(alphas~=0), 1), data(find(alphas~=0), 2), 'g.');
legend([lineh, supvh], 'Decision boundary', 'Support vectors');
hold off
end
注意:
如果您 运行 示例 1,您应该得到以下内容开头的输出:
如您所见,Aq
和 x
之间的乘积不会产生值 0,尽管它们应该是。在这个特定的例子中这不是一件坏事,但是如果我有更多的数据点,其中有很多小数点,这个不准确的问题就会变得越来越大,因为计算不是 exact。这很糟糕,例如当我在梯度投影方法中朝着最优解移动时正在搜索新的方向向量时。搜索方向不是完全正确的方向,但接近正确的方向。这就是为什么我想要完全正确的值……这可能吗?
我想知道数据点中的小数点是否与我的结果的准确性有关。见下图:
所以问题是:这是数据造成的还是优化过程有问题...
您是否在脚本中使用了 format
函数?看起来你在某个地方使用过 format rat
.
您始终可以使用 matlab eps
函数,即 matlab 内部使用的 returns 精度。根据我的 Matlab R2014B,-1/18014398509481984 的绝对值比这个小:
format long
a = abs(-1/18014398509481984)
b = eps
a < b
这基本上意味着结果为零(但是matlab停止了计算,因为根据eps
值,结果很好)。
否则,您可以在计算之前在脚本中使用 format long
。
编辑
我在您的代码中看到 inv
函数,请尝试将其替换为 \
运算符 (mldivide
)。它的结果会更准确,因为它使用高斯消去法,而不形成逆。
inv
文档指出:
In practice, it is seldom necessary to form the explicit inverse of a
matrix. A frequent misuse of inv arises when solving the system of
linear equations Ax = b. One way to solve this is with x = inv(A)*b. A
better way, from both an execution time and numerical accuracy
standpoint, is to use the matrix division operator x = A\b. This
produces the solution using Gaussian elimination, without forming the
inverse.
根据提供的代码,我是这样测试的:
我在下面的代码上添加了一个断点:
if Aq*x ~= 0
disp('ACTIVE SET CONSTRAINTS Aq :')
Aq
disp('CURRENT SOLUTION x :')
x
disp('MULTIPLICATION OF Aq and x')
Aq*x
end
当 if
分支被执行时,我在控制台输入:
K>> format rat; disp(x);
12/65
28/65
32/65
8/65
K>> disp(x == [12/65; 28/65; 32/65; 8/65]);
0
1
0
0
K>> format('long'); disp(max(abs(x - [12/65; 28/65; 32/65; 8/65])));
1.387778780781446e-17
K>> disp(eps(8/65));
1.387778780781446e-17
这表明这是一个显示问题:format rat
故意使用小整数来表示值,以牺牲精度为代价。显然,x(4) 的真实值是 8/65 的下一个,可能以 double
格式表示。
因此,这引出了一个问题:您确定数值收敛取决于翻转 double
精度值中的最低有效位吗?
我已经实现了一个脚本,该脚本执行约束优化以求解支持向量机模型的最佳参数。我注意到我的脚本出于某种原因给出了不准确的结果(尽管非常接近真实值)。例如,典型的情况是计算结果应该恰好为 0,但它却像
-1/18014398509481984 = -5.551115123125783e-17
当我将矩阵与向量相乘时会出现这种情况。让这也很奇怪的是,如果我在 Matlab 中的命令 window 中手动进行乘法运算,我得到的结果恰好为 0。
让我举个例子:如果我使用向量 Aq = [-1 -1 1 1]
和 x = [12/65 28/65 32/65 8/65]'
如果我在命令 window 中执行此操作,那么它们的乘法结果恰好为 0,正如你可以在下图中看到:
另一方面,如果我在我的函数脚本中这样做,我得到的结果不是 0,而是值 -1/18014398509481984。
这是我的脚本中负责此乘法的部分(我已将 Aq
和 x
添加到脚本中以显示 Aq
和 x
):
disp('DOT PRODUCT OF ACTIVE SET AND NEW POINT: ')
Aq
x
Aq*x
这是上面代码在 运行 时的结果:
如您所见,该值并不完全为 0,尽管它确实应该是 0。请注意,对于 Aq
和 x
的所有可能值,不会出现此问题。如果 Aq = [-1 -1 1 1]
和 x = [4/13 4/13 4/13 4/13]
结果恰好为 0,如下所示:
是什么导致了这种不准确?我怎样才能解决这个问题?
P.S。我没有包括我的整个代码,因为它没有很好的文档记录并且只有几百行,但如果需要的话我会的。
谢谢!
更新:新测试,使用 Ander Biguri 的建议:
更新 2:代码
function [weights, alphas, iters] = solveSVM(data, labels, C, e)
% FUNCTION [weights, alphas, iters] = solveSVM(data, labels, C, e)
%
% AUTHOR: jjepsuomi
%
% VERSION: 1.0
%
% DESCRIPTION:
% - This function will attempt to solve the optimal weights for a Support
% Vector Machines (SVM) model using active set method with gradient
% projection.
%
% INPUTS:
% "data" a n-by-m data matrix. The number of rows 'n' corresponds to the
% number of data points and the number of columns 'm' corresponds to the
% number of variables.
% "labels" a 1-by-n row vector of data labels from the set {-1,1}.
% "C" Box costraint upper limit. This will constrain the values of 'alphas'
% to the range 0 <= alphas <= C. If hard-margin SVM model is required set
% C=Inf.
% "e" a real value corresponding to the convergence criterion, that is if
% solution Xi and Xi-1 are within distance 'e' from each other stop the
% learning process, i.e. IF |F(Xi)-F(Xi-1)| < e ==> stop learning process.
%
% OUTPUTS:
% "weights" a vector corresponding to the optimal decision line parameters.
% "alphas" a vector of alpha-values corresponding to the optimal solution
% of the dual optimization problem of SVM.
% "iters" number of iterations until learning stopped.
%
% EXAMPLE USAGE 1:
%
% 'Hard-margin SVM':
%
% data = [0 0;2 2;2 0;3 0];
% labels = [-1 -1 1 1];
% [weights, alphas, iters] = solveSVM(data, labels, Inf, 10^-100)
%
% EXAMPLE USAGE 2:
%
% 'Soft-margin SVM':
%
% data = [0 0;2 2;2 0;3 0];
% labels = [-1 -1 1 1];
% [weights, alphas, iters] = solveSVM(data, labels, 0.8, 10^-100)
% STEP 1: INITIALIZATION OF THE PROBLEM
format long
% Calculate linear kernel matrix
L = kron(labels', labels);
K = data*data';
% Hessian matrix
Qd = L.*K;
% The minimization function
L = @(a) (1/2)*a'*Qd*a - ones(1, length(a))*a;
% Gradient of the minimizable function
gL = @(a) a'*Qd - ones(1, length(a));
% STEP 2: THE LEARNING PROCESS, ACTIVE SET WITH GRADIENT PROJECTION
% Initial feasible solution (required by gradient projection)
x = zeros(length(labels), 1);
iters = 1;
optfound = 0;
while optfound == 0 % criterion met
% Negative of the gradient at initial solution
g = -gL(x);
% Set the active set and projection matrix
Aq = labels; % In plane y^Tx = 0
P = eye(length(x))-Aq'*inv(Aq*Aq')*Aq; % In plane projection
% Values smaller than 'eps' are changed into 0
P(find(abs(P-0) < eps)) = 0;
d = P*g'; % Projection onto plane
if ~isempty(find(x==0 | x==C)) % Constraints active?
acinds = find(x==0 | x==C);
for i = 1:length(acinds)
if (x(acinds(i)) == 0 && d(acinds(i)) < 0) || x(acinds(i)) == C && d(acinds(i)) > 0
% Make the constraint vector
constr = zeros(1,length(x));
constr(acinds(i)) = 1;
Aq = [Aq; constr];
end
end
% Update the projection matrix
P = eye(length(x))-Aq'*inv(Aq*Aq')*Aq; % In plane / box projection
% Values smaller than 'eps' are changed into 0
P(find(abs(P-0) < eps)) = 0;
d = P*g'; % Projection onto plane / border
end
%%%% DISPLAY INFORMATION, THIS PART IS NOT NECESSAY, ONLY FOR DEBUGGING
if Aq*x ~= 0
disp('ACTIVE SET CONSTRAINTS Aq :')
Aq
disp('CURRENT SOLUTION x :')
x
disp('MULTIPLICATION OF Aq and x')
Aq*x
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Values smaller than 'eps' are changed into 0
d(find(abs(d-0) < eps)) = 0;
if ~isempty(find(d~=0)) && rank(P) < length(x) % Line search for optimal lambda
lopt = ((g*d)/(d'*Qd*d));
lmax = inf;
for i = 1:length(x)
if d(i) < 0 && -x(i) ~= 0 && -x(i)/d(i) <= lmax
lmax = -x(i)/d(i);
elseif d(i) > 0 && (C-x(i))/d(i) <= lmax
lmax = (C-x(i))/d(i);
end
end
lambda = max(0, min([lopt, lmax]));
if abs(lambda) < eps
lambda = 0;
end
xo = x;
x = x + lambda*d;
iters = iters + 1;
end
% Check whether search direction is 0-vector or 'e'-criterion met.
if isempty(find(d~=0)) || abs(L(x)-L(xo)) < e
optfound = 1;
end
end
%%% STEP 3: GET THE WEIGHTS
alphas = x;
w = zeros(1, length(data(1,:)));
for i = 1:size(data,1)
w = w + labels(i)*alphas(i)*data(i,:);
end
svinds = find(alphas>0);
svind = svinds(1);
b = 1/labels(svind) - w*data(svind, :)';
%%% STEP 4: OPTIMALITY CHECK, KKT conditions. See KKT-conditions for reference.
weights = [b; w'];
datadim = length(data(1,:));
Q = [zeros(1,datadim+1); zeros(datadim, 1), eye(datadim)];
A = [ones(size(data,1), 1), data];
for i = 1:length(labels)
A(i,:) = A(i,:)*labels(i);
end
LagDuG = Q*weights - A'*alphas;
Ac = A*weights - ones(length(labels),1);
alpA = alphas.*Ac;
LagDuG(any(abs(LagDuG-0) < 10^-14)) = 0;
if ~any(alphas < 0) && all(LagDuG == zeros(datadim+1,1)) && all(abs(Ac) >= 0) && all(abs(alpA) < 10^-6)
disp('Optimal found, Karush-Kuhn-Tucker conditions satisfied.')
else
disp('Optimal not found, Karush-Kuhn-Tucker conditions not satisfied.')
end
% VISUALIZATION FOR 2D-CASE
if size(data, 2) == 2
pinds = find(labels > 0);
ninds = find(labels < 0);
plot(data(pinds, 1), data(pinds, 2), 'o', 'MarkerFaceColor', 'red', 'MarkerEdgeColor', 'black')
hold on
plot(data(ninds, 1), data(ninds, 2), 'o', 'MarkerFaceColor', 'blue', 'MarkerEdgeColor', 'black')
Xb = min(data(:,1))-1;
Xe = max(data(:,1))+1;
Yb = -(b+w(1)*Xb)/w(2);
Ye = -(b+w(1)*Xe)/w(2);
lineh = plot([Xb Xe], [Yb Ye], 'LineWidth', 2);
supvh = plot(data(find(alphas~=0), 1), data(find(alphas~=0), 2), 'g.');
legend([lineh, supvh], 'Decision boundary', 'Support vectors');
hold off
end
注意:
如果您 运行 示例 1,您应该得到以下内容开头的输出:
如您所见,Aq
和 x
之间的乘积不会产生值 0,尽管它们应该是。在这个特定的例子中这不是一件坏事,但是如果我有更多的数据点,其中有很多小数点,这个不准确的问题就会变得越来越大,因为计算不是 exact。这很糟糕,例如当我在梯度投影方法中朝着最优解移动时正在搜索新的方向向量时。搜索方向不是完全正确的方向,但接近正确的方向。这就是为什么我想要完全正确的值……这可能吗?
我想知道数据点中的小数点是否与我的结果的准确性有关。见下图:
所以问题是:这是数据造成的还是优化过程有问题...
您是否在脚本中使用了 format
函数?看起来你在某个地方使用过 format rat
.
您始终可以使用 matlab eps
函数,即 matlab 内部使用的 returns 精度。根据我的 Matlab R2014B,-1/18014398509481984 的绝对值比这个小:
format long
a = abs(-1/18014398509481984)
b = eps
a < b
这基本上意味着结果为零(但是matlab停止了计算,因为根据eps
值,结果很好)。
否则,您可以在计算之前在脚本中使用 format long
。
编辑
我在您的代码中看到 inv
函数,请尝试将其替换为 \
运算符 (mldivide
)。它的结果会更准确,因为它使用高斯消去法,而不形成逆。
inv
文档指出:
In practice, it is seldom necessary to form the explicit inverse of a matrix. A frequent misuse of inv arises when solving the system of linear equations Ax = b. One way to solve this is with x = inv(A)*b. A better way, from both an execution time and numerical accuracy standpoint, is to use the matrix division operator x = A\b. This produces the solution using Gaussian elimination, without forming the inverse.
根据提供的代码,我是这样测试的:
我在下面的代码上添加了一个断点:
if Aq*x ~= 0 disp('ACTIVE SET CONSTRAINTS Aq :') Aq disp('CURRENT SOLUTION x :') x disp('MULTIPLICATION OF Aq and x') Aq*x end
当
if
分支被执行时,我在控制台输入:K>> format rat; disp(x); 12/65 28/65 32/65 8/65
K>> disp(x == [12/65; 28/65; 32/65; 8/65]); 0 1 0 0
K>> format('long'); disp(max(abs(x - [12/65; 28/65; 32/65; 8/65]))); 1.387778780781446e-17
K>> disp(eps(8/65)); 1.387778780781446e-17
这表明这是一个显示问题:format rat
故意使用小整数来表示值,以牺牲精度为代价。显然,x(4) 的真实值是 8/65 的下一个,可能以 double
格式表示。
因此,这引出了一个问题:您确定数值收敛取决于翻转 double
精度值中的最低有效位吗?