如何通过向量化下面的代码来避免循环?

How to avoid loops by Vectorizing below code?

下面的代码是正确的,但我想对其进行矢量化(并可能转换为 GPU)以提高速度。

如何将其转换为矢量形式?

RF = 4;     
inhibatory = 0;    
overlap=3;   
act_funct = 'sig';
gap = RF-overlap;    
Image1 = rand(30,22);  
Image2 = rand(27,19); % size_image2 is equal to 27x19
Image3 = rand(30,22); 
de_act_output = de_activate_Mat(Image1,act_funct); % finding derivative of the matrix. e.g. de_act_output = act_output.*(1-act_output) in case of sigmoid. 
for u=1:size(Image1,1)
    for v=1:size(Image1,2)
        sum_val=0;
        iLowMax=max(ceil((u-(RF+inhibatory))/(gap-inhibatory)),1);
        iHighMax=min(floor((u-1)/(gap-inhibatory))+1, size_image2(1));
        jLowMax=max(ceil((v-(RF+inhibatory))/(gap-inhibatory)),1);
        jHighMax = min(floor((v-1)/(gap-inhibatory))+1, size_image2(2));
        sum_sens = sum(sum(Image2(iLowMax:iHighMax,jLowMax:jHighMax)));
        sum_val = sum_sens(:,:) .* Image3(u,v);
        result(u,v) = de_act_output(u,v) .* sum_val;
    end
end

您在 iLowMax:iHighMax,jLowMax:jHighMax 的嵌套循环中创建了一个 parallelogram-like 块结构,它不会导致 任何简单的可向量化代码。但是,如果性能对您的情况至关重要并且看起来 convolution 在那里会有很好的用处,那么您可以对此进行全速矢量化。这里列出了一些调整 通过预先计算大多数其他内容,使该步骤周围的所有内容都更快,这必须导致明显的加速。这是实现 -

U = 1:size(Image1,1); %// Create arrays of iteration steps
V = 1:size(Image1,2);

%// Calculate arrays of low-high row and column indices 
iLowMax=max(ceil((U-(RF+inhibatory))/(gap-inhibatory)),1);
iHighMax=min(floor((U-1)/(gap-inhibatory))+1, size_image2(1));

jLowMax=max(ceil((V-(RF+inhibatory))/(gap-inhibatory)),1);
jHighMax = min(floor((V-1)/(gap-inhibatory))+1, size_image2(2));

sens_sums(size(Image1,1),size(Image1,2)) = 0; %// Pre-allocation
for u=1:size(Image1,1)
    for v=1:size(Image1,2)
        sens = Image2(iLowMax(u):iHighMax(u),jLowMax(v):jHighMax(v));
        sens_sums(u,v) = sum(sens(:));
    end
end
result = sens_sums.*Image3.*de_act_output;