在 MATLAB 中高效计算加权距离
Efficiently calculating weighted distance in MATLAB
Several posts exist 关于在 MATLAB 中高效计算成对距离。这些 post 往往涉及快速计算大量点之间的欧氏距离。
我需要创建一个函数来快速计算较小数量的点(通常少于 1000 对)之间的成对差异。在我正在编写的程序的宏大方案中,此函数将执行数千次,因此即使是效率上的微小提升也很重要。该功能需要在两个方面灵活:
- 在任何给定的呼叫中,距离度量可以是欧几里德或城市街区。
- 数据的维度是加权的。
据我所知,尚未 post 解决此特定问题。统计工具箱提供 pdist and pdist2,它接受许多不同的距离函数,但不接受加权。我已经看到这些功能的扩展允许加权,但这些扩展不允许用户 select 不同的距离功能。
理想情况下,我想避免使用统计工具箱中的函数(我不确定函数的用户是否可以访问这些工具箱)。
我写了两个函数来完成这个任务。第一个使用棘手的 repmat 和 permute 调用,第二个简单地使用 for 循环。
function [D] = pairdist1(A, B, wts, distancemetric)
% get some information about the data
numA = size(A,1);
numB = size(B,1);
if strcmp(distancemetric,'cityblock')
r=1;
elseif strcmp(distancemetric,'euclidean')
r=2;
else error('Function only accepts "cityblock" and "euclidean" distance')
end
% format weights for multiplication
wts = repmat(wts,[numA,1,numB]);
% get featural differences between A and B pairs
A = repmat(A,[1 1 numB]);
B = repmat(permute(B,[3,2,1]),[numA,1,1]);
differences = abs(A-B).^r;
% weigh difference values before combining them
differences = differences.*wts;
differences = differences.^(1/r);
% combine features to get distance
D = permute(sum(differences,2),[1,3,2]);
end
和:
function [D] = pairdist2(A, B, wts, distancemetric)
% get some information about the data
numA = size(A,1);
numB = size(B,1);
if strcmp(distancemetric,'cityblock')
r=1;
elseif strcmp(distancemetric,'euclidean')
r=2;
else error('Function only accepts "cityblock" and "euclidean" distance')
end
% use for-loops to generate differences
D = zeros(numA,numB);
for i=1:numA
for j=1:numB
differences = abs(A(i,:) - B(j,:)).^(1/r);
differences = differences.*wts;
differences = differences.^(1/r);
D(i,j) = sum(differences,2);
end
end
end
以下是性能测试:
A = rand(10,3);
B = rand(80,3);
wts = [0.1 0.5 0.4];
distancemetric = 'cityblock';
tic
D1 = pairdist1(A,B,wts,distancemetric);
toc
tic
D2 = pairdist2(A,B,wts,distancemetric);
toc
Elapsed time is 0.000238 seconds.
Elapsed time is 0.005350 seconds.
很明显,repmat-and-permute 版本比 double-for-loop 版本工作得更快,至少对于较小的数据集而言。但是我也知道调用 repmat 通常会减慢速度。所以我想知道SO社区中是否有任何人可以提供任何建议来提高这两个功能的效率!
编辑
@Luis Mendo 使用 bsxfun 对 repmat-and-permute 函数进行了很好的清理。我在不同大小的数据集上将他的函数与我的原始函数进行了比较:
随着数据越来越大,bsxfun版本成为明显的赢家!
编辑#2
我已经完成了函数的编写,它可以在 github [link]. I ended up finding a pretty good vectorized method for computing euclidean distance [link], so i use that method in the euclidean case, and i took @Divakar's 的 city-block 上使用。它仍然不如 pdist2 快,但它肯定比我之前在本文 post 中提出的任何一种方法都快,并且很容易接受权重。
您可以将 repmat
替换为 bsxfun
。这样做可以避免显式重复,因此内存效率更高,而且速度可能更快:
function D = pairdist1(A, B, wts, distancemetric)
if strcmp(distancemetric,'cityblock')
r=1;
elseif strcmp(distancemetric,'euclidean')
r=2;
else
error('Function only accepts "cityblock" and "euclidean" distance')
end
differences = abs(bsxfun(@minus, A, permute(B, [3 2 1]))).^r;
differences = bsxfun(@times, differences, wts).^(1/r);
D = permute(sum(differences,2),[1,3,2]);
end
对于r = 1 ("cityblock" case)
,您可以使用bsxfun
to get elementwise subtractions and then use matrix-multiplication
,这必须加快速度。实现看起来像这样 -
%// Calculate absolute elementiwse subtractions
absm = abs(bsxfun(@minus,permute(A,[1 3 2]),permute(B,[3 1 2])));
%// Perform matrix multiplications with the given weights and reshape
D = reshape(reshape(absm,[],size(A,2))*wts(:),size(A,1),[]);
Several posts exist 关于在 MATLAB 中高效计算成对距离。这些 post 往往涉及快速计算大量点之间的欧氏距离。
我需要创建一个函数来快速计算较小数量的点(通常少于 1000 对)之间的成对差异。在我正在编写的程序的宏大方案中,此函数将执行数千次,因此即使是效率上的微小提升也很重要。该功能需要在两个方面灵活:
- 在任何给定的呼叫中,距离度量可以是欧几里德或城市街区。
- 数据的维度是加权的。
据我所知,尚未 post 解决此特定问题。统计工具箱提供 pdist and pdist2,它接受许多不同的距离函数,但不接受加权。我已经看到这些功能的扩展允许加权,但这些扩展不允许用户 select 不同的距离功能。
理想情况下,我想避免使用统计工具箱中的函数(我不确定函数的用户是否可以访问这些工具箱)。
我写了两个函数来完成这个任务。第一个使用棘手的 repmat 和 permute 调用,第二个简单地使用 for 循环。
function [D] = pairdist1(A, B, wts, distancemetric)
% get some information about the data
numA = size(A,1);
numB = size(B,1);
if strcmp(distancemetric,'cityblock')
r=1;
elseif strcmp(distancemetric,'euclidean')
r=2;
else error('Function only accepts "cityblock" and "euclidean" distance')
end
% format weights for multiplication
wts = repmat(wts,[numA,1,numB]);
% get featural differences between A and B pairs
A = repmat(A,[1 1 numB]);
B = repmat(permute(B,[3,2,1]),[numA,1,1]);
differences = abs(A-B).^r;
% weigh difference values before combining them
differences = differences.*wts;
differences = differences.^(1/r);
% combine features to get distance
D = permute(sum(differences,2),[1,3,2]);
end
和:
function [D] = pairdist2(A, B, wts, distancemetric)
% get some information about the data
numA = size(A,1);
numB = size(B,1);
if strcmp(distancemetric,'cityblock')
r=1;
elseif strcmp(distancemetric,'euclidean')
r=2;
else error('Function only accepts "cityblock" and "euclidean" distance')
end
% use for-loops to generate differences
D = zeros(numA,numB);
for i=1:numA
for j=1:numB
differences = abs(A(i,:) - B(j,:)).^(1/r);
differences = differences.*wts;
differences = differences.^(1/r);
D(i,j) = sum(differences,2);
end
end
end
以下是性能测试:
A = rand(10,3);
B = rand(80,3);
wts = [0.1 0.5 0.4];
distancemetric = 'cityblock';
tic
D1 = pairdist1(A,B,wts,distancemetric);
toc
tic
D2 = pairdist2(A,B,wts,distancemetric);
toc
Elapsed time is 0.000238 seconds.
Elapsed time is 0.005350 seconds.
很明显,repmat-and-permute 版本比 double-for-loop 版本工作得更快,至少对于较小的数据集而言。但是我也知道调用 repmat 通常会减慢速度。所以我想知道SO社区中是否有任何人可以提供任何建议来提高这两个功能的效率!
编辑
@Luis Mendo 使用 bsxfun 对 repmat-and-permute 函数进行了很好的清理。我在不同大小的数据集上将他的函数与我的原始函数进行了比较:
随着数据越来越大,bsxfun版本成为明显的赢家!
编辑#2
我已经完成了函数的编写,它可以在 github [link]. I ended up finding a pretty good vectorized method for computing euclidean distance [link], so i use that method in the euclidean case, and i took @Divakar's
您可以将 repmat
替换为 bsxfun
。这样做可以避免显式重复,因此内存效率更高,而且速度可能更快:
function D = pairdist1(A, B, wts, distancemetric)
if strcmp(distancemetric,'cityblock')
r=1;
elseif strcmp(distancemetric,'euclidean')
r=2;
else
error('Function only accepts "cityblock" and "euclidean" distance')
end
differences = abs(bsxfun(@minus, A, permute(B, [3 2 1]))).^r;
differences = bsxfun(@times, differences, wts).^(1/r);
D = permute(sum(differences,2),[1,3,2]);
end
对于r = 1 ("cityblock" case)
,您可以使用bsxfun
to get elementwise subtractions and then use matrix-multiplication
,这必须加快速度。实现看起来像这样 -
%// Calculate absolute elementiwse subtractions
absm = abs(bsxfun(@minus,permute(A,[1 3 2]),permute(B,[3 1 2])));
%// Perform matrix multiplications with the given weights and reshape
D = reshape(reshape(absm,[],size(A,2))*wts(:),size(A,1),[]);