MATLAB 中视频压缩的并行计算

Parallel Computing for video compression in MATLAB

我需要一些有关 MATLAB 并行编程的帮助。明确地说,我以前从未在我的任何代码中实施过并行化技术。 我有一个视频压缩引擎,作为我大学项目的一部分开发的。它是 H.264 视频压缩引擎的基本版本。我必须为此引擎实施 MATALB 中可用的并行处理技术。基本上,我有一个将图像帧分成多个块的函数(由块的大小预先确定)。我正在尝试部分或完全并行化此代码块。当块之间没有依赖关系时,我使用了“parfor”,效果很好。我已经上传了这个实现。现在我正在尝试并行化一个案例,如果块之间存在依赖关系。

function [reconstructed_frames, residual_blocks, encoded_data_cell, bit_count_coeff_per_frame, bit_count_mv_per_frame_cell, real_avg_bit_count_per_row_per_frame, total_bit_count_per_frame, QP_used_in_row, scene_change_frames, SAD_value_per_frame] = block_prediction_parallalized(Y, block_size, srch_rng, QP, I_period,pathToResiduals, no_ref_frames, VBS_enable, Fast_ME_enable,Frac_ME_enable,lambda, RC_flag, avg_bit_count_row_vary_QP, target_bits_per_frame)
%Function to predcit frames based on inter prediction and intra prediction,
%with the given I-period
Y = int64(Y);
[no_rows, no_cols, no_frames] = size(Y);
no_blocks_in_row = (no_cols*block_size)/(block_size*block_size);
no_blocks_in_col = (no_rows*block_size)/(block_size*block_size);
total_blocks_per_frame = (no_rows*no_cols)/(block_size*block_size);
encoded_data_cell = cell(1,total_blocks_per_frame,no_frames);
encoded_data_per_frame = cell(1, total_blocks_per_frame);
ref_frame_inter = zeros(no_rows, no_cols, 1, 'int64') + 128;
bit_count_coeff_per_frame = 0;
bit_count_mv_per_frame_cell = 0;
real_avg_bit_count_per_row_per_frame = 0;
QP_used_in_row = zeros(1,no_blocks_in_col,no_frames);
QP_used_in_row(:,:,:) = QP;
scene_change_frames = [];
SAD_value_per_frame = 0;
ref_frame_index_count = 1;
for k = 1:no_frames
    if k>1
        ref_frame_inter(:,:,1) = Y(:,:,k-1);
    end
    block_segment = 0;
    bitCountMV = 0;
     for row = 1 : block_size : no_rows - block_size + 1
         for col = 1 : block_size : no_cols - block_size + 1
            block_segment = block_segment + 1;
            row_start = row;
            row_end = row_start + block_size - 1;
            col_start = col;
            col_end = col_start + block_size - 1;
            row_end = min(row_end, no_rows);
            col_end = min(col_end, no_cols);
        
            % Making an array of blocks of size block_size
            block_list_currframe(:,:,block_segment) = Y(row_start:row_end, col_start:col_end, k);
            location_pointers(block_segment,:) = [row_start row_end col_start col_end];           
         end         
     end
     %Parallelizing the block encoding process
     max_index = size(block_list_currframe,3);
     %Loop for processing blocks concurrently
     parfor block_index = 1:max_index
        % Funtion for inter-prediction
        [encoded_data, reconstructed_block, residual_block, bit_count_per_block] = paral_debug_funct(block_index, location_pointers, block_list_currframe, ref_frame_inter, block_size, srch_rng, QP, no_rows, no_cols, ref_frame_index_count, VBS_enable, Fast_ME_enable, Frac_ME_enable, lambda);
        
        %Buffering the output of each worker
        reconstructed_blocks(:,:,block_index) = reconstructed_block;
        residual_blocks_in_frame(:,:,block_index) = residual_block;
        encoded_data_per_frame(:,:, block_index) = encoded_data;
        total_bit_count_per_block(block_index) = bit_count_per_block;
     end
     
     %Processing the buffered outputs obtained after processing all the
     %blocks.
     for block_index = 1:size(block_list_currframe,3)
%          [row_start, row_end, col_start, col_end] = location_pointers(block_index,:);
        row_start = location_pointers(block_index, 1);
        row_end = location_pointers(block_index, 2);
        col_start = location_pointers(block_index, 3);
        col_end = location_pointers(block_index, 4);
        reconstructed_frames(row_start:row_end, col_start:col_end, k) = reconstructed_blocks(:,:,block_index);
        residual_blocks(:,:,block_index,k) =  residual_blocks_in_frame(:,:,block_index);
        encoded_data_cell(:,:,block_index,k) = encoded_data_per_frame(:,:,block_index);
     end
     total_bit_count_per_frame(k) = sum(total_bit_count_per_block, 'all');
end

在上面的代码中,块不必相互通信。现在,我要求它们在某个时候相互通信,因为某些块的处理将不得不等待前一个块完成。 我认为下图将有助于使其更清晰。

我了解到有两种类型的并行处理可用,多线程和多处理。我认为多线程适合我的用例。我读过 spmd 和 parfeval,但是,我遇到的例子通常不是很详细。由于我是并行处理的新手,所以这些选项感觉很混乱,很难选择关注哪一个。我想我想要的是工作人员能够在执行期间相互沟通?我不确定。如果您需要数据大小的大致概念:video_frame size = 288x352(CIF 格式) 块大小 = 16 帧数 = 21

谢谢!

P.S 抱歉这么长post,我试图尽可能清楚地解释它

你可以在非并行 for 中使用 parfor,像这样:

previous_blocks = {};
for color : ["green", "red", "blue"]
  input_blocks = extract cell array of blocks with same color from the image
  processed_blocks = cell(1, numel(input_blocks));
  parfor i=1:numel(input_blocks)
    processed_blocks{i} = process_based_on_previous_blocks (i, input_blocks{i}, previous_blocks);
  end
  previous_blocks = processed_blocks;
  place processed_blocks in their original position in the image;
end