Select 个基于给定位置的矩阵条目
Select entries of matrix based on given locations
我有以下矩阵(MxN,其中 M ≤ N):
0.8147 0.9134 0.2785 0.9649
0.9058 0.6324 0.5469 0.1576
0.1270 0.0975 0.9575 0.9706
在每一行中,我想分别 select 以下列条目(每行一个):
idx = [ 3 1 4 ];
这意味着我们保留 (1,3)、(2,1) 和 (3,4) 中的元素,数组的其余部分应该为零。
对于上面的示例,我将得到以下输出:
0 0 0.2785 0
0.9058 0 0 0
0 0 0 0.9706
我目前使用循环生成它,当矩阵大小较大时循环会变慢。
谁能推荐一种更高效的方法?
您可以使用sub2ind
函数将条目索引转换为线性索引。
使用线性索引时,matlab 将矩阵视为长列向量。
org_mat=[0.8147 0.9134 0.2785 0.9649
0.9058 0.6324 0.5469 0.1576
0.1270 0.0975 0.9575 0.9706];
entries=[3,1,4];
linear_entries=sub2ind(size(org_mat),1:length(entries),entries);
new_mat=zeros(size(org_mat));
new_mat(linear_entries)=org_mat(linear_entries);
这应该比sub2ind
快:
m = [0.8147, 0.9134, 0.2785, 0.9649;
0.9058, 0.6324, 0.5469, 0.1576;
0.1270, 0.0975, 0.9575, 0.9706];
n=[3,1,4];
linear = (1:length(n)) + (n-1)*size(m,1);
new_m = zeros(size(m));
new_m(linear) = m(linear);
另一个 answers/comments 中有一些关于性能的讨论。这是一个简单的(构造良好的)for
循环可以很好地完成工作的情况之一,对性能基本上没有影响。
% For some original matrix 'm', and column indexing array 'idx':
x = zeros( size(m) ); % Initialise output of zeros
for ii = 1:numel(idx) % Loop over indices
% Assign the value at the column index for this row
x( ii, idx(ii) ) = m( ii, idx(ii) );
end
此代码可读性强且快速。为了证明 "quick",我在 MATLAB R2017b 上为所有 4 个当前答案的方法 运行 编写了以下基准测试代码。这是输出图。
对于 "small" 矩阵,最多 2^5 列和 2^4 行:
对于 "large" 矩阵,最多 2^15 列和 2^14 行(具有和不具有 bsxfun
解决方案的同一图,因为它会破坏缩放比例):
第一个情节可能有点误导。尽管结果一致(性能排名慢-快是 bsxfun
然后是 sub2ind
然后是手动索引然后循环),y 轴是 10^(-5) 秒,所以基本上哪种方法无关紧要你正在使用!
第二个图表明,对于大型矩阵,这些方法基本上是等效的,除了 bsxfun
很糟糕(此处未显示,但它需要更多内存)。
我会选择更清晰的循环,它可以让您更加灵活,并且 2 年后您会清楚地记得它在您的代码中做了什么。
基准代码:
function benchie()
K = 5; % Max loop variable
T = zeros( K, 4 ); % Timing results
for k = 1:K
M = 2^(k-1); N = 2^k; % size of matrix
m = rand( M, N ); % random matrix
idx = randperm( N, M ); % column indices
% Define anonymous functions with no inputs for timeit, and run
f1 = @() f_sub2ind( m, idx ); T(k,1) = timeit(f1);
f2 = @() f_linear( m, idx ); T(k,2) = timeit(f2);
f3 = @() f_loop( m, idx ); T(k,3) = timeit(f3);
f4 = @() f_bsxfun( m, idx ); T(k,4) = timeit(f4);
end
% Plot results
plot( (1:K)', T, 'linewidth', 2 );
legend( {'sub2ind', 'linear', 'loop', 'bsxfun'} );
xlabel( 'k, where matrix had 2^{(k-1)} rows and 2^k columns' );
ylabel( 'function time (s)' )
end
function f_sub2ind( m, idx )
% Using the in-built sub2ind to generate linear indices, then indexing
lin_idx = sub2ind( size(m), 1:numel(idx), idx );
x = zeros( size(m) );
x( lin_idx ) = m( lin_idx );
end
function f_linear( m, idx )
% Manually calculating linear indices, then indexing
lin_idx = (1:numel(idx)) + (idx-1)*size(m,1);
x = zeros( size(m) );
x( lin_idx ) = m( lin_idx );
end
function f_loop( m, idx )
% Directly indexing in a simple loop
x = zeros( size(m) );
for ii = 1:numel(idx)
x( ii, idx(ii) ) = m( ii, idx(ii) );
end
end
function f_bsxfun( m, idx )
% Using bsxfun to create a logical matrix of desired elements, then masking
% Since R2016b, can use 'x = ( (1:size(m,2)) == idx(:) ) .* m;'
x = bsxfun(@eq, 1:size(m,2), idx(:)).*m;
end
No bsxfun
no party
令 m
为输入矩阵,idx
为具有列索引的向量。您可以从 idx
构建逻辑掩码,然后按元素乘以 m
,如下所示:
result = bsxfun(@eq, 1:size(m,2), idx(:)).*m;
TL;DR - 这是我的建议:
nI = numel(idx);
sz = size(m);
x = sparse( 1:nI, idx, m(sub2ind( size(m), 1:numel(idx), idx )), sz(1), sz(2), nI);
post 的其余部分讨论了为什么它工作得更好。
看到所需的输出矩阵主要由零组成,这实际上需要使用 sparse matrices!这不仅会提高性能(尤其是对于较大的矩阵),而且对内存更友好。
我要在 中添加两个函数:
function x = f_sp_loop( m, idx )
nI = numel(idx);
sz = size(m);
x = spalloc( sz(1), sz(2), nI ); % Initialize a sparse array.
for indI = 1:nI
x( indI, idx(indI) ) = m( indI, idx(indI) ); % This generates a warning (inefficient)
end
end
function x = f_sp_sub2ind( m, idx )
nI = numel(idx);
sz = size(m);
x = sparse( 1:nI, idx, m(sub2ind( size(m), 1:numel(idx), idx )), sz(1), sz(2), nI);
end
本质上,区别在于我们没有将输出预分配为零数组,而是预分配为稀疏数组。 benchmark1结果如下:
...这提出了一个问题 - 为什么 sparse
方法快一个数量级?
要回答这个问题,我们应该查看基准测试函数内部的实际运行时分布,我们可以使用 profile('-memory','on')
从 profiler. To get even more information, we can make the profiler output memory consumption info 中获得。在 运行 稍微短一点的基准测试版本 2 之后,它只针对 k
的最高值运行,我们得到:
所以我们可以总结出几件事:
- 绝大多数运行时间都花在分配和释放内存上,这就是为什么这些算法看起来具有几乎相同的性能。因此,如果我们减少内存分配,我们将直接节省时间(
sparse
的一大优点!)。
- 即使
sub2ind
和 loop
方法看起来相同,我们仍然可以在两者之间建立 "winner"(见底部图像上的紫色框)- sub2ind
! sub2ind
为 32 毫秒,循环为 41 毫秒。
稀疏循环方法出奇地慢,正如我们被mlint
:
警告的那样
Explanation
Code Analyzer detects an indexing pattern for a sparse array that is likely to be slow. An assignment that changes the nonzero pattern of a sparse array can cause this error because such assignments result in considerable overhead.
Suggested Action
If possible, build sparse arrays using sparse
as follows, and do not use indexed assignments (such as C(4) = B) to build them:
- Create separate index and value arrays.
- Call sparse to assemble the index and value arrays.
If you must use indexed assignments to build sparse arrays, you can optimize performance by first preallocating the sparse array with spalloc
.
If the code changes only array elements that are already nonzero, then the overhead is reasonable. Suppress this message as described in Adjust Code Analyzer Message Indicators and Messages.
For more information, see “Constructing Sparse Matrices”.
- 结合了两全其美的方法,即
sparse
的内存节省和 sub2ind
的矢量化,似乎是运行时间仅为 3 毫秒的最佳方法!
1图表制作代码:
function q51605093()
K = 15; % Max loop variable
T = zeros( K, 4 ); % Timing results
for k = 1:K
M = 2^(k-1); N = 2^k; % size of matrix
m = rand( M, N ); % random matrix
idx = randperm( N, M ); % column indices
% Define anonymous functions with no inputs, for timeit, and run
f = cell(4,1);
f{1} = @() f_sub2ind( m, idx );
f{2} = @() f_loop( m, idx );
f{3} = @() f_sp_loop( m, idx );
f{4} = @() f_sp_sub2ind( m, idx );
T(k,:) = cellfun(@timeit, f);
if k == 5 % test equality during one of the runs
R = cellfun(@feval, f, 'UniformOutput', false);
assert(isequal(R{:}));
end
end
% Plot results
figure();
semilogy( (1:K).', T, 'linewidth', 2 ); grid on; xticks(0:K);
legend( {'sub2ind', 'loop', 'sp\_loop', 'sp\_sub2ind'}, 'Location', 'NorthWest' );
xlabel( 'k, where matrix had 2^{(k-1)} rows and 2^k columns' );
ylabel( 'function time (s)' )
end
function x = f_sub2ind( m, idx )
% Using the in-built sub2ind to generate linear indices, then indexing
lin_idx = sub2ind( size(m), 1:numel(idx), idx );
x = zeros( size(m) );
x( lin_idx ) = m( lin_idx );
end
function x = f_loop( m, idx )
% Directly indexing in a simple loop
x = zeros( size(m) );
for ii = 1:numel(idx)
x( ii, idx(ii) ) = m( ii, idx(ii) );
end
end
function x = f_sp_loop( m, idx )
nI = numel(idx);
sz = size(m);
x = spalloc( sz(1), sz(2), nI ); % Initialize a sparse array.
for indI = 1:nI
x( indI, idx(indI) ) = m( indI, idx(indI) ); % This generates a warning (inefficient)
end
end
function x = f_sp_sub2ind( m, idx )
nI = numel(idx);
sz = size(m);
x = sparse( 1:nI, idx, m(sub2ind( size(m), 1:numel(idx), idx )), sz(1), sz(2), nI);
end
2 分析代码:
function q51605093_MB()
K = 15; % Max loop variable
M = 2^(K-1); N = 2^K; % size of matrix
m = rand( M, N ); % random matrix
idx = randperm( N, M ); % column indices
% Define anonymous functions with no inputs, for timeit, and run
f = cell(4,1);
f{1} = f_sub2ind( m, idx );
f{2} = f_loop( m, idx );
f{3} = f_sp_loop( m, idx );
f{4} = f_sp_sub2ind( m, idx );
% assert(isequal(f{:}));
end
... the rest is the same as above
我有以下矩阵(MxN,其中 M ≤ N):
0.8147 0.9134 0.2785 0.9649
0.9058 0.6324 0.5469 0.1576
0.1270 0.0975 0.9575 0.9706
在每一行中,我想分别 select 以下列条目(每行一个):
idx = [ 3 1 4 ];
这意味着我们保留 (1,3)、(2,1) 和 (3,4) 中的元素,数组的其余部分应该为零。
对于上面的示例,我将得到以下输出:
0 0 0.2785 0
0.9058 0 0 0
0 0 0 0.9706
我目前使用循环生成它,当矩阵大小较大时循环会变慢。
谁能推荐一种更高效的方法?
您可以使用sub2ind
函数将条目索引转换为线性索引。
使用线性索引时,matlab 将矩阵视为长列向量。
org_mat=[0.8147 0.9134 0.2785 0.9649
0.9058 0.6324 0.5469 0.1576
0.1270 0.0975 0.9575 0.9706];
entries=[3,1,4];
linear_entries=sub2ind(size(org_mat),1:length(entries),entries);
new_mat=zeros(size(org_mat));
new_mat(linear_entries)=org_mat(linear_entries);
这应该比sub2ind
快:
m = [0.8147, 0.9134, 0.2785, 0.9649;
0.9058, 0.6324, 0.5469, 0.1576;
0.1270, 0.0975, 0.9575, 0.9706];
n=[3,1,4];
linear = (1:length(n)) + (n-1)*size(m,1);
new_m = zeros(size(m));
new_m(linear) = m(linear);
另一个 answers/comments 中有一些关于性能的讨论。这是一个简单的(构造良好的)for
循环可以很好地完成工作的情况之一,对性能基本上没有影响。
% For some original matrix 'm', and column indexing array 'idx':
x = zeros( size(m) ); % Initialise output of zeros
for ii = 1:numel(idx) % Loop over indices
% Assign the value at the column index for this row
x( ii, idx(ii) ) = m( ii, idx(ii) );
end
此代码可读性强且快速。为了证明 "quick",我在 MATLAB R2017b 上为所有 4 个当前答案的方法 运行 编写了以下基准测试代码。这是输出图。
对于 "small" 矩阵,最多 2^5 列和 2^4 行:
对于 "large" 矩阵,最多 2^15 列和 2^14 行(具有和不具有
bsxfun
解决方案的同一图,因为它会破坏缩放比例):
第一个情节可能有点误导。尽管结果一致(性能排名慢-快是 bsxfun
然后是 sub2ind
然后是手动索引然后循环),y 轴是 10^(-5) 秒,所以基本上哪种方法无关紧要你正在使用!
第二个图表明,对于大型矩阵,这些方法基本上是等效的,除了 bsxfun
很糟糕(此处未显示,但它需要更多内存)。
我会选择更清晰的循环,它可以让您更加灵活,并且 2 年后您会清楚地记得它在您的代码中做了什么。
基准代码:
function benchie()
K = 5; % Max loop variable
T = zeros( K, 4 ); % Timing results
for k = 1:K
M = 2^(k-1); N = 2^k; % size of matrix
m = rand( M, N ); % random matrix
idx = randperm( N, M ); % column indices
% Define anonymous functions with no inputs for timeit, and run
f1 = @() f_sub2ind( m, idx ); T(k,1) = timeit(f1);
f2 = @() f_linear( m, idx ); T(k,2) = timeit(f2);
f3 = @() f_loop( m, idx ); T(k,3) = timeit(f3);
f4 = @() f_bsxfun( m, idx ); T(k,4) = timeit(f4);
end
% Plot results
plot( (1:K)', T, 'linewidth', 2 );
legend( {'sub2ind', 'linear', 'loop', 'bsxfun'} );
xlabel( 'k, where matrix had 2^{(k-1)} rows and 2^k columns' );
ylabel( 'function time (s)' )
end
function f_sub2ind( m, idx )
% Using the in-built sub2ind to generate linear indices, then indexing
lin_idx = sub2ind( size(m), 1:numel(idx), idx );
x = zeros( size(m) );
x( lin_idx ) = m( lin_idx );
end
function f_linear( m, idx )
% Manually calculating linear indices, then indexing
lin_idx = (1:numel(idx)) + (idx-1)*size(m,1);
x = zeros( size(m) );
x( lin_idx ) = m( lin_idx );
end
function f_loop( m, idx )
% Directly indexing in a simple loop
x = zeros( size(m) );
for ii = 1:numel(idx)
x( ii, idx(ii) ) = m( ii, idx(ii) );
end
end
function f_bsxfun( m, idx )
% Using bsxfun to create a logical matrix of desired elements, then masking
% Since R2016b, can use 'x = ( (1:size(m,2)) == idx(:) ) .* m;'
x = bsxfun(@eq, 1:size(m,2), idx(:)).*m;
end
No
bsxfun
no party
令 m
为输入矩阵,idx
为具有列索引的向量。您可以从 idx
构建逻辑掩码,然后按元素乘以 m
,如下所示:
result = bsxfun(@eq, 1:size(m,2), idx(:)).*m;
TL;DR - 这是我的建议:
nI = numel(idx);
sz = size(m);
x = sparse( 1:nI, idx, m(sub2ind( size(m), 1:numel(idx), idx )), sz(1), sz(2), nI);
post 的其余部分讨论了为什么它工作得更好。
看到所需的输出矩阵主要由零组成,这实际上需要使用 sparse matrices!这不仅会提高性能(尤其是对于较大的矩阵),而且对内存更友好。
我要在
function x = f_sp_loop( m, idx )
nI = numel(idx);
sz = size(m);
x = spalloc( sz(1), sz(2), nI ); % Initialize a sparse array.
for indI = 1:nI
x( indI, idx(indI) ) = m( indI, idx(indI) ); % This generates a warning (inefficient)
end
end
function x = f_sp_sub2ind( m, idx )
nI = numel(idx);
sz = size(m);
x = sparse( 1:nI, idx, m(sub2ind( size(m), 1:numel(idx), idx )), sz(1), sz(2), nI);
end
本质上,区别在于我们没有将输出预分配为零数组,而是预分配为稀疏数组。 benchmark1结果如下:
...这提出了一个问题 - 为什么 sparse
方法快一个数量级?
要回答这个问题,我们应该查看基准测试函数内部的实际运行时分布,我们可以使用 profile('-memory','on')
从 profiler. To get even more information, we can make the profiler output memory consumption info 中获得。在 运行 稍微短一点的基准测试版本 2 之后,它只针对 k
的最高值运行,我们得到:
所以我们可以总结出几件事:
- 绝大多数运行时间都花在分配和释放内存上,这就是为什么这些算法看起来具有几乎相同的性能。因此,如果我们减少内存分配,我们将直接节省时间(
sparse
的一大优点!)。 - 即使
sub2ind
和loop
方法看起来相同,我们仍然可以在两者之间建立 "winner"(见底部图像上的紫色框)-sub2ind
!sub2ind
为 32 毫秒,循环为 41 毫秒。 稀疏循环方法出奇地慢,正如我们被
警告的那样mlint
:Explanation
Code Analyzer detects an indexing pattern for a sparse array that is likely to be slow. An assignment that changes the nonzero pattern of a sparse array can cause this error because such assignments result in considerable overhead.
Suggested Action
If possible, build sparse arrays using
sparse
as follows, and do not use indexed assignments (such as C(4) = B) to build them:- Create separate index and value arrays.
- Call sparse to assemble the index and value arrays.
If you must use indexed assignments to build sparse arrays, you can optimize performance by first preallocating the sparse array with
spalloc
.If the code changes only array elements that are already nonzero, then the overhead is reasonable. Suppress this message as described in Adjust Code Analyzer Message Indicators and Messages.
For more information, see “Constructing Sparse Matrices”.
- 结合了两全其美的方法,即
sparse
的内存节省和sub2ind
的矢量化,似乎是运行时间仅为 3 毫秒的最佳方法!
1图表制作代码:
function q51605093()
K = 15; % Max loop variable
T = zeros( K, 4 ); % Timing results
for k = 1:K
M = 2^(k-1); N = 2^k; % size of matrix
m = rand( M, N ); % random matrix
idx = randperm( N, M ); % column indices
% Define anonymous functions with no inputs, for timeit, and run
f = cell(4,1);
f{1} = @() f_sub2ind( m, idx );
f{2} = @() f_loop( m, idx );
f{3} = @() f_sp_loop( m, idx );
f{4} = @() f_sp_sub2ind( m, idx );
T(k,:) = cellfun(@timeit, f);
if k == 5 % test equality during one of the runs
R = cellfun(@feval, f, 'UniformOutput', false);
assert(isequal(R{:}));
end
end
% Plot results
figure();
semilogy( (1:K).', T, 'linewidth', 2 ); grid on; xticks(0:K);
legend( {'sub2ind', 'loop', 'sp\_loop', 'sp\_sub2ind'}, 'Location', 'NorthWest' );
xlabel( 'k, where matrix had 2^{(k-1)} rows and 2^k columns' );
ylabel( 'function time (s)' )
end
function x = f_sub2ind( m, idx )
% Using the in-built sub2ind to generate linear indices, then indexing
lin_idx = sub2ind( size(m), 1:numel(idx), idx );
x = zeros( size(m) );
x( lin_idx ) = m( lin_idx );
end
function x = f_loop( m, idx )
% Directly indexing in a simple loop
x = zeros( size(m) );
for ii = 1:numel(idx)
x( ii, idx(ii) ) = m( ii, idx(ii) );
end
end
function x = f_sp_loop( m, idx )
nI = numel(idx);
sz = size(m);
x = spalloc( sz(1), sz(2), nI ); % Initialize a sparse array.
for indI = 1:nI
x( indI, idx(indI) ) = m( indI, idx(indI) ); % This generates a warning (inefficient)
end
end
function x = f_sp_sub2ind( m, idx )
nI = numel(idx);
sz = size(m);
x = sparse( 1:nI, idx, m(sub2ind( size(m), 1:numel(idx), idx )), sz(1), sz(2), nI);
end
2 分析代码:
function q51605093_MB()
K = 15; % Max loop variable
M = 2^(K-1); N = 2^K; % size of matrix
m = rand( M, N ); % random matrix
idx = randperm( N, M ); % column indices
% Define anonymous functions with no inputs, for timeit, and run
f = cell(4,1);
f{1} = f_sub2ind( m, idx );
f{2} = f_loop( m, idx );
f{3} = f_sp_loop( m, idx );
f{4} = f_sp_sub2ind( m, idx );
% assert(isequal(f{:}));
end
... the rest is the same as above