如何找到一个向量中最接近(最近)的值到另一个向量?
How to find closest (nearest) value within a vector to another vector?
我有两个大小相等的向量,例如
A=[2.29 2.56 2.77 2.90 2.05] and
B=[2.34 2.62 2.67 2.44 2.52].
我有兴趣在两个相同大小的向量 A 和 B 中找到最接近(几乎相等)的值,即在 A 的所有元素中,哪个值最接近 B 的任何元素?该解决方案也应该可以扩展到任意数量的(相等大小的)向量。意味着能够使用一组相同大小的向量 A、B 和 C 找到最接近的值。两个结果值可以来自两个向量中的任何一个。
为清楚起见,我对在单个向量中找到最接近的值不感兴趣。上述示例的答案是值 2.56 和 2.52。
作为两个向量的起点,使用bsxfun
:
%// data
A = [2.29 2.56 2.77 2.90 2.05]
B = [2.34 2.62 2.67 2.44 2.52]
%// distance matrix
dist = abs(bsxfun(@minus,A(:),B(:).'));
%// find row and col indices of minimum
[~,idx] = min(dist(:))
[ii,jj] = ind2sub( [numel(A), numel(B)], idx)
%// output
a = A(ii)
b = B(jj)
现在你可以把它放到一个循环中等等
顺便说一句:
dist = abs(bsxfun(@minus,A(:),B(:).'));
相当于更明显的:
dist = pdist2( A(:), B(:) )
但我宁愿选择第一个解决方案来避免开销。
最后是多向量的完全向量化方法:
%// data
data{1} = [2.29 2.56 2.77 2.90 2.05];
data{2} = [2.34 2.62 2.67 2.44 2.52];
data{3} = [2.34 2.62 2.67 2.44 2.52].*2;
data{4} = [2.34 2.62 2.67 2.44 2.52].*4;
%// length of each vector
N = 5;
%// create Filter for distance matrix
nans(1:numel(data)) = {NaN(N)};
mask = blkdiag(nans{:}) + 1;
%// create new input for bsxfun
X = [data{:}];
%// filtered distance matrix
dist = mask.*abs(bsxfun(@minus,X(:),X(:).'));
%// find row and col indices of minimum
[~,idx] = min(dist(:))
[ii,jj] = ind2sub( size(dist), idx)
%// output
a = X(ii)
b = X(jj)
这适用于一般数字 的向量可能不同的长度:
vectors = {[2.29 2.56 2.77 2.90 2.05] [2.34 2.62 2.67 2.44 2.52] [1 2 3 4]};
% Cell array of data vectors; 3 in this example
s = cellfun(@numel, vectors); % Get vector lengths
v = [vectors{:}]; % Concatenate all vectors into a vector
D = abs(bsxfun(@minus, v, v.')); % Compute distances. This gives a matrix.
% Distances within the same vector will have to be discarded. This will be
% done by replacing those values with NaN, in blocks
bb = arrayfun(@(x) NaN(x), s, 'uniformoutput', false); % Cell array of blocks
B = blkdiag(bb{:}); % NaN mask with those blocks
[~, ind] = min(D(:) + B(:)); % Add that mask. Get arg min in linear index
[ii, jj] = ind2sub(size(D), ind); % Convert to row and column indices
result = v([ii jj]); % Index into concatenated vector
正如长评论,如果您可以访问 Statistics and Machine Learning Toolbox,那么您可以使用 K-Nearest Neighbors 函数,它有一些优点,例如:
处理不同长度的数组,例如当 size(A) = [M, 1] 和 size(B) = [ N, 1]
处理二维数组,例如当 size(A) = [M, d] 和 size(B) = [N, d]
处理不同的距离类型,例如:Euclidean、City block、Chebychev 等等,甚至您拥有自定义距离.
在某些特殊情况下使用 KDTree 算法会产生很好的性能。
虽然在你的情况下 "Luis Mendo" 的答案看起来很不错,但它不能像工具箱提供的 K-最近邻函数那样扩展。
更新:示例代码
% A and B could have any Dimension, just same number of columns (signal Dimension)
A = rand(1000,4);
B = rand(500,4);
% Use any distance you like, some of them are not supported for KDTreeSearcher,
% and you should use ExhaustiveSearcher
myKnnModel= KDTreeSearcher(A, 'Distance', 'minkowski');
% you can ask for many (K) Nearest Neighbors and you always have access to it for later uses
[Idx, D] = knnsearch(myKnnModel, B, 'K',2);
% and this is answer to your special case
[~, idxA] = min(D(:, 1))
idxB = Idx(idxA)
我有两个大小相等的向量,例如
A=[2.29 2.56 2.77 2.90 2.05] and
B=[2.34 2.62 2.67 2.44 2.52].
我有兴趣在两个相同大小的向量 A 和 B 中找到最接近(几乎相等)的值,即在 A 的所有元素中,哪个值最接近 B 的任何元素?该解决方案也应该可以扩展到任意数量的(相等大小的)向量。意味着能够使用一组相同大小的向量 A、B 和 C 找到最接近的值。两个结果值可以来自两个向量中的任何一个。
为清楚起见,我对在单个向量中找到最接近的值不感兴趣。上述示例的答案是值 2.56 和 2.52。
作为两个向量的起点,使用bsxfun
:
%// data
A = [2.29 2.56 2.77 2.90 2.05]
B = [2.34 2.62 2.67 2.44 2.52]
%// distance matrix
dist = abs(bsxfun(@minus,A(:),B(:).'));
%// find row and col indices of minimum
[~,idx] = min(dist(:))
[ii,jj] = ind2sub( [numel(A), numel(B)], idx)
%// output
a = A(ii)
b = B(jj)
现在你可以把它放到一个循环中等等
顺便说一句:
dist = abs(bsxfun(@minus,A(:),B(:).'));
相当于更明显的:
dist = pdist2( A(:), B(:) )
但我宁愿选择第一个解决方案来避免开销。
最后是多向量的完全向量化方法:
%// data
data{1} = [2.29 2.56 2.77 2.90 2.05];
data{2} = [2.34 2.62 2.67 2.44 2.52];
data{3} = [2.34 2.62 2.67 2.44 2.52].*2;
data{4} = [2.34 2.62 2.67 2.44 2.52].*4;
%// length of each vector
N = 5;
%// create Filter for distance matrix
nans(1:numel(data)) = {NaN(N)};
mask = blkdiag(nans{:}) + 1;
%// create new input for bsxfun
X = [data{:}];
%// filtered distance matrix
dist = mask.*abs(bsxfun(@minus,X(:),X(:).'));
%// find row and col indices of minimum
[~,idx] = min(dist(:))
[ii,jj] = ind2sub( size(dist), idx)
%// output
a = X(ii)
b = X(jj)
这适用于一般数字 的向量可能不同的长度:
vectors = {[2.29 2.56 2.77 2.90 2.05] [2.34 2.62 2.67 2.44 2.52] [1 2 3 4]};
% Cell array of data vectors; 3 in this example
s = cellfun(@numel, vectors); % Get vector lengths
v = [vectors{:}]; % Concatenate all vectors into a vector
D = abs(bsxfun(@minus, v, v.')); % Compute distances. This gives a matrix.
% Distances within the same vector will have to be discarded. This will be
% done by replacing those values with NaN, in blocks
bb = arrayfun(@(x) NaN(x), s, 'uniformoutput', false); % Cell array of blocks
B = blkdiag(bb{:}); % NaN mask with those blocks
[~, ind] = min(D(:) + B(:)); % Add that mask. Get arg min in linear index
[ii, jj] = ind2sub(size(D), ind); % Convert to row and column indices
result = v([ii jj]); % Index into concatenated vector
正如长评论,如果您可以访问 Statistics and Machine Learning Toolbox,那么您可以使用 K-Nearest Neighbors 函数,它有一些优点,例如:
处理不同长度的数组,例如当 size(A) = [M, 1] 和 size(B) = [ N, 1]
处理二维数组,例如当 size(A) = [M, d] 和 size(B) = [N, d]
处理不同的距离类型,例如:Euclidean、City block、Chebychev 等等,甚至您拥有自定义距离.
在某些特殊情况下使用 KDTree 算法会产生很好的性能。
虽然在你的情况下 "Luis Mendo" 的答案看起来很不错,但它不能像工具箱提供的 K-最近邻函数那样扩展。
更新:示例代码
% A and B could have any Dimension, just same number of columns (signal Dimension)
A = rand(1000,4);
B = rand(500,4);
% Use any distance you like, some of them are not supported for KDTreeSearcher,
% and you should use ExhaustiveSearcher
myKnnModel= KDTreeSearcher(A, 'Distance', 'minkowski');
% you can ask for many (K) Nearest Neighbors and you always have access to it for later uses
[Idx, D] = knnsearch(myKnnModel, B, 'K',2);
% and this is answer to your special case
[~, idxA] = min(D(:, 1))
idxB = Idx(idxA)