MATLAB 中视频压缩的并行计算
Parallel Computing for video compression in MATLAB
我需要一些有关 MATLAB 并行编程的帮助。明确地说,我以前从未在我的任何代码中实施过并行化技术。
我有一个视频压缩引擎,作为我大学项目的一部分开发的。它是 H.264 视频压缩引擎的基本版本。我必须为此引擎实施 MATALB 中可用的并行处理技术。基本上,我有一个将图像帧分成多个块的函数(由块的大小预先确定)。我正在尝试部分或完全并行化此代码块。当块之间没有依赖关系时,我使用了“parfor”,效果很好。我已经上传了这个实现。现在我正在尝试并行化一个案例,如果块之间存在依赖关系。
function [reconstructed_frames, residual_blocks, encoded_data_cell, bit_count_coeff_per_frame, bit_count_mv_per_frame_cell, real_avg_bit_count_per_row_per_frame, total_bit_count_per_frame, QP_used_in_row, scene_change_frames, SAD_value_per_frame] = block_prediction_parallalized(Y, block_size, srch_rng, QP, I_period,pathToResiduals, no_ref_frames, VBS_enable, Fast_ME_enable,Frac_ME_enable,lambda, RC_flag, avg_bit_count_row_vary_QP, target_bits_per_frame)
%Function to predcit frames based on inter prediction and intra prediction,
%with the given I-period
Y = int64(Y);
[no_rows, no_cols, no_frames] = size(Y);
no_blocks_in_row = (no_cols*block_size)/(block_size*block_size);
no_blocks_in_col = (no_rows*block_size)/(block_size*block_size);
total_blocks_per_frame = (no_rows*no_cols)/(block_size*block_size);
encoded_data_cell = cell(1,total_blocks_per_frame,no_frames);
encoded_data_per_frame = cell(1, total_blocks_per_frame);
ref_frame_inter = zeros(no_rows, no_cols, 1, 'int64') + 128;
bit_count_coeff_per_frame = 0;
bit_count_mv_per_frame_cell = 0;
real_avg_bit_count_per_row_per_frame = 0;
QP_used_in_row = zeros(1,no_blocks_in_col,no_frames);
QP_used_in_row(:,:,:) = QP;
scene_change_frames = [];
SAD_value_per_frame = 0;
ref_frame_index_count = 1;
for k = 1:no_frames
if k>1
ref_frame_inter(:,:,1) = Y(:,:,k-1);
end
block_segment = 0;
bitCountMV = 0;
for row = 1 : block_size : no_rows - block_size + 1
for col = 1 : block_size : no_cols - block_size + 1
block_segment = block_segment + 1;
row_start = row;
row_end = row_start + block_size - 1;
col_start = col;
col_end = col_start + block_size - 1;
row_end = min(row_end, no_rows);
col_end = min(col_end, no_cols);
% Making an array of blocks of size block_size
block_list_currframe(:,:,block_segment) = Y(row_start:row_end, col_start:col_end, k);
location_pointers(block_segment,:) = [row_start row_end col_start col_end];
end
end
%Parallelizing the block encoding process
max_index = size(block_list_currframe,3);
%Loop for processing blocks concurrently
parfor block_index = 1:max_index
% Funtion for inter-prediction
[encoded_data, reconstructed_block, residual_block, bit_count_per_block] = paral_debug_funct(block_index, location_pointers, block_list_currframe, ref_frame_inter, block_size, srch_rng, QP, no_rows, no_cols, ref_frame_index_count, VBS_enable, Fast_ME_enable, Frac_ME_enable, lambda);
%Buffering the output of each worker
reconstructed_blocks(:,:,block_index) = reconstructed_block;
residual_blocks_in_frame(:,:,block_index) = residual_block;
encoded_data_per_frame(:,:, block_index) = encoded_data;
total_bit_count_per_block(block_index) = bit_count_per_block;
end
%Processing the buffered outputs obtained after processing all the
%blocks.
for block_index = 1:size(block_list_currframe,3)
% [row_start, row_end, col_start, col_end] = location_pointers(block_index,:);
row_start = location_pointers(block_index, 1);
row_end = location_pointers(block_index, 2);
col_start = location_pointers(block_index, 3);
col_end = location_pointers(block_index, 4);
reconstructed_frames(row_start:row_end, col_start:col_end, k) = reconstructed_blocks(:,:,block_index);
residual_blocks(:,:,block_index,k) = residual_blocks_in_frame(:,:,block_index);
encoded_data_cell(:,:,block_index,k) = encoded_data_per_frame(:,:,block_index);
end
total_bit_count_per_frame(k) = sum(total_bit_count_per_block, 'all');
end
在上面的代码中,块不必相互通信。现在,我要求它们在某个时候相互通信,因为某些块的处理将不得不等待前一个块完成。
我认为下图将有助于使其更清晰。
我了解到有两种类型的并行处理可用,多线程和多处理。我认为多线程适合我的用例。我读过 spmd 和 parfeval,但是,我遇到的例子通常不是很详细。由于我是并行处理的新手,所以这些选项感觉很混乱,很难选择关注哪一个。我想我想要的是工作人员能够在执行期间相互沟通?我不确定。如果您需要数据大小的大致概念:video_frame size = 288x352(CIF 格式)
块大小 = 16
帧数 = 21
谢谢!
P.S 抱歉这么长post,我试图尽可能清楚地解释它
你可以在非并行 for
中使用 parfor
,像这样:
previous_blocks = {};
for color : ["green", "red", "blue"]
input_blocks = extract cell array of blocks with same color from the image
processed_blocks = cell(1, numel(input_blocks));
parfor i=1:numel(input_blocks)
processed_blocks{i} = process_based_on_previous_blocks (i, input_blocks{i}, previous_blocks);
end
previous_blocks = processed_blocks;
place processed_blocks in their original position in the image;
end
我需要一些有关 MATLAB 并行编程的帮助。明确地说,我以前从未在我的任何代码中实施过并行化技术。 我有一个视频压缩引擎,作为我大学项目的一部分开发的。它是 H.264 视频压缩引擎的基本版本。我必须为此引擎实施 MATALB 中可用的并行处理技术。基本上,我有一个将图像帧分成多个块的函数(由块的大小预先确定)。我正在尝试部分或完全并行化此代码块。当块之间没有依赖关系时,我使用了“parfor”,效果很好。我已经上传了这个实现。现在我正在尝试并行化一个案例,如果块之间存在依赖关系。
function [reconstructed_frames, residual_blocks, encoded_data_cell, bit_count_coeff_per_frame, bit_count_mv_per_frame_cell, real_avg_bit_count_per_row_per_frame, total_bit_count_per_frame, QP_used_in_row, scene_change_frames, SAD_value_per_frame] = block_prediction_parallalized(Y, block_size, srch_rng, QP, I_period,pathToResiduals, no_ref_frames, VBS_enable, Fast_ME_enable,Frac_ME_enable,lambda, RC_flag, avg_bit_count_row_vary_QP, target_bits_per_frame)
%Function to predcit frames based on inter prediction and intra prediction,
%with the given I-period
Y = int64(Y);
[no_rows, no_cols, no_frames] = size(Y);
no_blocks_in_row = (no_cols*block_size)/(block_size*block_size);
no_blocks_in_col = (no_rows*block_size)/(block_size*block_size);
total_blocks_per_frame = (no_rows*no_cols)/(block_size*block_size);
encoded_data_cell = cell(1,total_blocks_per_frame,no_frames);
encoded_data_per_frame = cell(1, total_blocks_per_frame);
ref_frame_inter = zeros(no_rows, no_cols, 1, 'int64') + 128;
bit_count_coeff_per_frame = 0;
bit_count_mv_per_frame_cell = 0;
real_avg_bit_count_per_row_per_frame = 0;
QP_used_in_row = zeros(1,no_blocks_in_col,no_frames);
QP_used_in_row(:,:,:) = QP;
scene_change_frames = [];
SAD_value_per_frame = 0;
ref_frame_index_count = 1;
for k = 1:no_frames
if k>1
ref_frame_inter(:,:,1) = Y(:,:,k-1);
end
block_segment = 0;
bitCountMV = 0;
for row = 1 : block_size : no_rows - block_size + 1
for col = 1 : block_size : no_cols - block_size + 1
block_segment = block_segment + 1;
row_start = row;
row_end = row_start + block_size - 1;
col_start = col;
col_end = col_start + block_size - 1;
row_end = min(row_end, no_rows);
col_end = min(col_end, no_cols);
% Making an array of blocks of size block_size
block_list_currframe(:,:,block_segment) = Y(row_start:row_end, col_start:col_end, k);
location_pointers(block_segment,:) = [row_start row_end col_start col_end];
end
end
%Parallelizing the block encoding process
max_index = size(block_list_currframe,3);
%Loop for processing blocks concurrently
parfor block_index = 1:max_index
% Funtion for inter-prediction
[encoded_data, reconstructed_block, residual_block, bit_count_per_block] = paral_debug_funct(block_index, location_pointers, block_list_currframe, ref_frame_inter, block_size, srch_rng, QP, no_rows, no_cols, ref_frame_index_count, VBS_enable, Fast_ME_enable, Frac_ME_enable, lambda);
%Buffering the output of each worker
reconstructed_blocks(:,:,block_index) = reconstructed_block;
residual_blocks_in_frame(:,:,block_index) = residual_block;
encoded_data_per_frame(:,:, block_index) = encoded_data;
total_bit_count_per_block(block_index) = bit_count_per_block;
end
%Processing the buffered outputs obtained after processing all the
%blocks.
for block_index = 1:size(block_list_currframe,3)
% [row_start, row_end, col_start, col_end] = location_pointers(block_index,:);
row_start = location_pointers(block_index, 1);
row_end = location_pointers(block_index, 2);
col_start = location_pointers(block_index, 3);
col_end = location_pointers(block_index, 4);
reconstructed_frames(row_start:row_end, col_start:col_end, k) = reconstructed_blocks(:,:,block_index);
residual_blocks(:,:,block_index,k) = residual_blocks_in_frame(:,:,block_index);
encoded_data_cell(:,:,block_index,k) = encoded_data_per_frame(:,:,block_index);
end
total_bit_count_per_frame(k) = sum(total_bit_count_per_block, 'all');
end
在上面的代码中,块不必相互通信。现在,我要求它们在某个时候相互通信,因为某些块的处理将不得不等待前一个块完成。
我认为下图将有助于使其更清晰。
我了解到有两种类型的并行处理可用,多线程和多处理。我认为多线程适合我的用例。我读过 spmd 和 parfeval,但是,我遇到的例子通常不是很详细。由于我是并行处理的新手,所以这些选项感觉很混乱,很难选择关注哪一个。我想我想要的是工作人员能够在执行期间相互沟通?我不确定。如果您需要数据大小的大致概念:video_frame size = 288x352(CIF 格式) 块大小 = 16 帧数 = 21
谢谢!
P.S 抱歉这么长post,我试图尽可能清楚地解释它
你可以在非并行 for
中使用 parfor
,像这样:
previous_blocks = {};
for color : ["green", "red", "blue"]
input_blocks = extract cell array of blocks with same color from the image
processed_blocks = cell(1, numel(input_blocks));
parfor i=1:numel(input_blocks)
processed_blocks{i} = process_based_on_previous_blocks (i, input_blocks{i}, previous_blocks);
end
previous_blocks = processed_blocks;
place processed_blocks in their original position in the image;
end