如何在存在 for 循环和矩阵向量乘法的情况下减少计算时间

How to reduce the time of computation in presence of for loops and matrix-vector multiplication

我正在编写一个程序,其中计算时间非常重要,因此我必须以减少时间的方式编写代码。在下面,我写了一个代码,但是如果我的向量长度变大,它会很耗时。无论如何,有没有更快的方式产生相同的结果?

K1 = [1 2 3 4 5];  K2 = [6 7 8 9 10];
kt1 = [1.5 3 4.5]; kt2 = [6.5 8 9.5];
numk1 = bsxfun(@minus,K1.',kt1);
denomk1 = bsxfun(@minus, kt1.',kt1)+eye(numel(kt1));
numk2 = bsxfun(@minus,K2.',kt2);
denomk2 = bsxfun(@minus, kt2.', kt2)+eye(numel(kt2));

for j=1:numel(kt1)
    for jj=1:numel(kt2)
        k1_dir = bsxfun(@rdivide,numk1,denomk1(j,:)); k1_dir(:,j)=[];
        k_dir1 = prod(k1_dir,2);
        k2_dir = bsxfun(@rdivide,numk2,denomk2(jj,:)); k2_dir(:,jj)=[];
        k_dir2 = prod(k2_dir,2);
        k1_k2(:,:,j,jj) = k_dir1 * k_dir2';
    end
end 

在上面的代码中,随着K1K2的长度增加,kt1kt2的长度也会增加。因此,对于长向量长度,此代码非常耗时。

vectorization 上全力以赴,这可能是替换问题中列出的代码的循环部分的一种方法 -

%// Size parameters
M1 = numel(K1);
N1 = numel(kt1);
M2 = numel(K2);
N2 = numel(kt2);

%// Indices to be removed from k1_dir & k2_dir. 
%// In our vectorized version, we will just use these to set 
%// corresponding elements in vectorized versions of k1_dir & k2_dir
%// to ONES, as later on PROD would take care of it.
rm_idx1 = bsxfun(@plus,[1:M1]',[0:N1-1]*(M1*N1+M1)); %//'
rm_idx2 = bsxfun(@plus,[1:M2]',[0:N2-1]*(M2*N2+M2)); %//'

%// Get vectorized version of k1_dir, as k1_dirv
k1_dirv = bsxfun(@rdivide,numk1,permute(denomk1,[3 2 1]));
k1_dirv(rm_idx1) = 1;
k_dir1v = prod(k1_dirv,2);

%// Get vectorized version of k2_dir, as k2_dirv
k2_dirv = bsxfun(@rdivide,numk2,permute(denomk2,[3 2 1]))
k2_dirv(rm_idx2) = 1;
k_dir2v = prod(k2_dirv,2);

%// Get vectorized version of k1_k2, as k1_k2v
k1_k2v = bsxfun(@times,k_dir1v,permute(k_dir2v,[2 1 4 3]));

快速运行时测试:

像这样设置输入 -

SZ1 = 100;
SZ2 = 100;

K1 = randi(9,1,SZ1);
K2 = randi(9,1,SZ1);

kt1 = randi(9,1,SZ2);
kt2 = randi(9,1,SZ2);

原始循环部分的运行时间(在添加带有零的预分配代码以实现更公平的基准测试之后)和建议的矢量化方法是 -

-------------------------- With Original Loopy Approach
Elapsed time is 1.086666 seconds.
-------------------------- With Proposed Vectorized Approach
Elapsed time is 0.178805 seconds.

似乎 JIT 没有显示它的魔力,至少当 bsxfun 在嵌套循环中使用时没有,而且你需要在每次迭代中索引到那个巨大的 4D 数组对你没有帮助。因此,在这种情况下全速进行矢量化更有意义!