运行-MATLAB 中的长度解码
Run-length decoding in MATLAB
为了巧妙地使用线性索引或 accumarray
,我有时觉得有必要根据 run-length encoding 生成序列。由于没有内置函数,我要求最有效的方法来解码用 RLE 编码的序列。
规格:
为了进行公平比较,我想为函数设置一些规范:
- 如果指定了相同长度的可选第二个参数
values
,则输出应根据这些值,否则只是值 1:length(runLengths)
.
- 优雅地处理:
- 在
runLengths
中有零个
values
是元胞数组。
- 输出向量应具有与
runLengths
相同的 column/row 格式
简而言之:该函数应该等同于以下代码:
function V = runLengthDecode(runLengths, values)
[~,V] = histc(1:sum(runLengths), cumsum([1,runLengths(:).']));
if nargin>1
V = reshape(values(V), 1, []);
end
V = shiftdim(V, ~isrow(runLengths));
end
示例:
这里有几个测试用例
runLengthDecode([0,1,0,2])
runLengthDecode([0,1,0,4], [1,2,4,5].')
runLengthDecode([0,1,0,2].', [10,20,30,40])
runLengthDecode([0,3,1,0], {'a','b',1,2})
及其输出:
>> runLengthDecode([0,1,0,2])
ans =
2 4 4
>> runLengthDecode([0,1,0,4], [1,2,4,5].')
ans =
2 5 5 5 5
>> runLengthDecode([0,1,0,2].', [10,20,30,40])
ans =
20
40
40
>> runLengthDecode([0,3,1,0],{'a','b',1,2})
ans =
'b' 'b' 'b' [1]
方法一
这应该相当快。它用
bsxfun
创建大小为 numel(runLengths)
xnumel(runLengths)
的矩阵,因此它可能不适合巨大的输入大小。
function V = runLengthDecode(runLengths, values)
nn = 1:numel(runLengths);
if nargin==1 %// handle one-input case
values = nn;
end
V = values(nonzeros(bsxfun(@times, nn,...
bsxfun(@le, (1:max(runLengths)).', runLengths(:).'))));
if size(runLengths,1)~=size(values,1) %// adjust orientation of output vector
V = V.';
end
方法二
这种方法,基于cumsum
, is an adaptation of that used in 。它比方法 1 使用更少的内存。
function V = runLengthDecode2(runLengths, values)
if nargin==1 %// handle one-input case
values = 1:numel(runLengths);
end
[ii, ~, jj] = find(runLengths(:));
V(cumsum(jj(end:-1:1))) = 1;
V = values(ii(cumsum(V(end:-1:1))));
if size(runLengths,1)~=size(values,1) %// adjust orientation of output vector
V = V.';
end
为了找出最有效的解决方案,我们提供了一个评估性能的测试脚本。第一张图描绘了向量 runLengths
长度增长 运行 次,其中条目均匀分布,最大长度为 200。 最快,Divakar 的解决方案是第二名。
第二个图使用几乎相同的测试数据,只是它包含长度为 2000
的初始 运行。这主要影响两个 bsxfun
解决方案,而其他解决方案的性能非常相似。
测试表明 of gnovice's original answer 的性能最高。
如果您想自己进行速度比较,这里是用于生成上面的绘图的代码。
function theLastRunLengthDecodingComputationComparisonYoullEverNeed()
Funcs = {@knedlsepp0, ...
@LuisMendo1bsxfun, ...
@LuisMendo2cumsum, ...
@gnovice3cumsum, ...
@Divakar4replicate_bsxfunmask, ...
@knedlsepp5cumsumaccumarray
};
%% Growing number of runs, low maximum sizes in runLengths
ns = 2.^(1:25);
paramGenerators{1} = arrayfun(@(n) @(){randi(200,n,1)}, ns,'uni',0);
paramGenerators{2} = arrayfun(@(n) @(){[2000;randi(200,n,1)]}, ns,'uni',0);
for i = 1:2
times = compareFunctions(Funcs, paramGenerators{i}, 0.5);
finishedComputations = any(~isnan(times),2);
h = figure('Visible', 'off');
loglog(ns(finishedComputations), times(finishedComputations,:));
legend(cellfun(@func2str,Funcs,'uni',0),'Location','NorthWest','Interpreter','none');
title('Runtime comparison for run length decoding - Growing number of runs');
xlabel('length(runLengths)'); ylabel('seconds');
print(['-f',num2str(h)],'-dpng','-r100',['RunLengthComparsion',num2str(i)]);
end
end
function times = compareFunctions(Funcs, paramGenerators, timeLimitInSeconds)
if nargin<3
timeLimitInSeconds = Inf;
end
times = zeros(numel(paramGenerators),numel(Funcs));
for i = 1:numel(paramGenerators)
Params = feval(paramGenerators{i});
for j = 1:numel(Funcs)
if max(times(:,j))<timeLimitInSeconds
times(i,j) = timeit(@()feval(Funcs{j},Params{:}));
else
times(i,j) = NaN;
end
end
end
end
%% // #################################
%% // HERE COME ALL THE FANCY FUNCTIONS
%% // #################################
function V = knedlsepp0(runLengths, values)
[~,V] = histc(1:sum(runLengths), cumsum([1,runLengths(:).']));%'
if nargin>1
V = reshape(values(V), 1, []);
end
V = shiftdim(V, ~isrow(runLengths));
end
%% // #################################
function V = LuisMendo1bsxfun(runLengths, values)
nn = 1:numel(runLengths);
if nargin==1 %// handle one-input case
values = nn;
end
V = values(nonzeros(bsxfun(@times, nn,...
bsxfun(@le, (1:max(runLengths)).', runLengths(:).'))));
if size(runLengths,1)~=size(values,1) %// adjust orientation of output vector
V = V.'; %'
end
end
%% // #################################
function V = LuisMendo2cumsum(runLengths, values)
if nargin==1 %// handle one-input case
values = 1:numel(runLengths);
end
[ii, ~, jj] = find(runLengths(:));
V(cumsum(jj(end:-1:1))) = 1;
V = values(ii(cumsum(V(end:-1:1))));
if size(runLengths,1)~=size(values,1) %// adjust orientation of output vector
V = V.'; %'
end
end
%% // #################################
function V = gnovice3cumsum(runLengths, values)
isColumnVector = size(runLengths,1)>1;
if nargin==1 %// handle one-input case
values = 1:numel(runLengths);
end
values = reshape(values(runLengths~=0),1,[]);
if isempty(values) %// If there are no runs
V = []; return;
end
runLengths = nonzeros(runLengths(:));
index = zeros(1,sum(runLengths));
index(cumsum([1;runLengths(1:end-1)])) = 1;
V = values(cumsum(index));
if isColumnVector %// adjust orientation of output vector
V = V.'; %'
end
end
%% // #################################
function V = Divakar4replicate_bsxfunmask(runLengths, values)
if nargin==1 %// Handle one-input case
values = 1:numel(runLengths);
end
%// Do size checking to make sure that both values and runlengths are row vectors.
if size(values,1) > 1
values = values.'; %//'
end
if size(runLengths,1) > 1
yes_transpose_output = false;
runLengths = runLengths.'; %//'
else
yes_transpose_output = true;
end
maxlen = max(runLengths);
all_values = repmat(values,maxlen,1);
%// OR all_values = values(ones(1,maxlen),:);
V = all_values(bsxfun(@le,(1:maxlen)',runLengths)); %//'
%// Bring the shape of V back to the shape of runlengths
if yes_transpose_output
V = V.'; %//'
end
end
%% // #################################
function V = knedlsepp5cumsumaccumarray(runLengths, values)
isRowVector = size(runLengths,2)>1;
%// Actual computation using column vectors
V = cumsum(accumarray(cumsum([1; runLengths(:)]), 1));
V = V(1:end-1);
%// In case of second argument
if nargin>1
V = reshape(values(V),[],1);
end
%// If original was a row vector, transpose
if isRowVector
V = V.'; %'
end
end
此处介绍的解决方案基本上分两步完成 run-length decoding
-
- 复制所有
values
到最大数量 runLengths
。
- 使用
bsxfun
的屏蔽功能从每一列 select 对应 runlengths
。
函数代码中的其余内容将负责输入和输出大小以满足问题中设置的要求。
接下来列出的函数代码是 one of my previous answers to a similar problem 的 "cleaned-up" 版本。这是代码 -
function V = replicate_bsxfunmask(runLengths, values)
if nargin==1 %// Handle one-input case
values = 1:numel(runLengths);
end
%// Do size checking to make sure that both values and runlengths are row vectors.
if size(values,1) > 1
values = values.'; %//'
end
if size(runLengths,1) > 1
yes_transpose_output = false;
runLengths = runLengths.'; %//'
else
yes_transpose_output = true;
end
maxlen = max(runLengths);
all_values = repmat(values,maxlen,1);
%// OR all_values = values(ones(1,maxlen),:);
V = all_values(bsxfun(@le,(1:maxlen)',runLengths)); %//'
%// Bring the shape of V back to the shape of runlengths
if yes_transpose_output
V = V.'; %//'
end
return;
接下来列出的代码将是一个混合代码 (cumsum
+ replicate_bsxfunmask
),当您有大量异常值或非常大的异常值时将适用。同样为了简单起见,目前这仅适用于数字数组。这是实现 -
function out = replicate_bsxfunmask_v2(runLengths, values)
if nargin==1 %// Handle one-input case
values = 1:numel(runLengths);
end
if size(values,1) > 1
values = values.'; %//'
end
if size(runLengths,1) > 1
yes_transpose_output = true;
runLengths = runLengths.'; %//'
else
yes_transpose_output = false;
end
%// Regularize inputs
values = values(runLengths>0);
runLengths = runLengths(runLengths>0);
%// Main portion of code
thresh = 200; %// runlengths threshold that are to be processed with cumsum
crunLengths = cumsum(runLengths); %%// cumsums of runlengths
mask = runLengths >= thresh; %// mask of runlengths above threshold
starts = [1 crunLengths(1:end-1)+1]; %// starts of each group of runlengths
mask_ind = find(mask); %// indices of mask
post_mark = starts(mask);
negt_mark = crunLengths(mask)+1;
if ~isempty(negt_mark) && negt_mark(end) > crunLengths(end)
negt_mark(end) = [];
end
%// Create array & set starts markers for starts of runlengths above thresh
marked_out = zeros(1,crunLengths(end));
marked_out(post_mark) = mask_ind;
marked_out(negt_mark) = marked_out(negt_mark) -1*mask_ind(1:numel(negt_mark));
%// Setup output array with the cumsumed version of marked array
out = cumsum(marked_out);
%// Mask for final ouput to decide between large and small runlengths
thresh_mask = out~=0;
%// Fill output array with cumsum and then rep-bsxfun based approaches
out(thresh_mask) = values(out(thresh_mask));
values = values(~mask);
runLengths = runLengths(~mask);
maxlen = max(runLengths);
all_values = repmat(values,maxlen,1);
out(~thresh_mask) = all_values(bsxfun(@le,(1:maxlen)',runLengths)); %//'
if yes_transpose_output
out = out.'; %//'
end
return;
从 R2015a 开始,函数 repelem
是执行此操作的最佳选择:
function V = runLengthDecode(runLengths, values)
if nargin<2
values = 1:numel(runLengths);
end
V = repelem(values, runLengths);
end
对于 R2015a 之前的版本,这是一个快速的替代方案:
替代repelem
:
我觉得 gnovice 的方法可以通过使用比预处理掩码更好的零运行长度修复来改进。
所以我试了一下 accumarray
。看来这给了解决方案又一个提升:
function V = runLengthDecode(runLengths, values)
%// Actual computation using column vectors
V = cumsum(accumarray(cumsum([1; runLengths(:)]), 1));
V = V(1:end-1);
%// In case of second argument
if nargin>1
V = reshape(values(V),[],1);
end
%// If original was a row vector, transpose
if size(runLengths,2)>1
V = V.'; %'
end
end
为了巧妙地使用线性索引或 accumarray
,我有时觉得有必要根据 run-length encoding 生成序列。由于没有内置函数,我要求最有效的方法来解码用 RLE 编码的序列。
规格:
为了进行公平比较,我想为函数设置一些规范:
- 如果指定了相同长度的可选第二个参数
values
,则输出应根据这些值,否则只是值1:length(runLengths)
. - 优雅地处理:
- 在
runLengths
中有零个
values
是元胞数组。
- 在
- 输出向量应具有与
runLengths
相同的 column/row 格式
简而言之:该函数应该等同于以下代码:
function V = runLengthDecode(runLengths, values)
[~,V] = histc(1:sum(runLengths), cumsum([1,runLengths(:).']));
if nargin>1
V = reshape(values(V), 1, []);
end
V = shiftdim(V, ~isrow(runLengths));
end
示例:
这里有几个测试用例
runLengthDecode([0,1,0,2])
runLengthDecode([0,1,0,4], [1,2,4,5].')
runLengthDecode([0,1,0,2].', [10,20,30,40])
runLengthDecode([0,3,1,0], {'a','b',1,2})
及其输出:
>> runLengthDecode([0,1,0,2])
ans =
2 4 4
>> runLengthDecode([0,1,0,4], [1,2,4,5].')
ans =
2 5 5 5 5
>> runLengthDecode([0,1,0,2].', [10,20,30,40])
ans =
20
40
40
>> runLengthDecode([0,3,1,0],{'a','b',1,2})
ans =
'b' 'b' 'b' [1]
方法一
这应该相当快。它用
bsxfun
创建大小为 numel(runLengths)
xnumel(runLengths)
的矩阵,因此它可能不适合巨大的输入大小。
function V = runLengthDecode(runLengths, values)
nn = 1:numel(runLengths);
if nargin==1 %// handle one-input case
values = nn;
end
V = values(nonzeros(bsxfun(@times, nn,...
bsxfun(@le, (1:max(runLengths)).', runLengths(:).'))));
if size(runLengths,1)~=size(values,1) %// adjust orientation of output vector
V = V.';
end
方法二
这种方法,基于cumsum
, is an adaptation of that used in
function V = runLengthDecode2(runLengths, values)
if nargin==1 %// handle one-input case
values = 1:numel(runLengths);
end
[ii, ~, jj] = find(runLengths(:));
V(cumsum(jj(end:-1:1))) = 1;
V = values(ii(cumsum(V(end:-1:1))));
if size(runLengths,1)~=size(values,1) %// adjust orientation of output vector
V = V.';
end
为了找出最有效的解决方案,我们提供了一个评估性能的测试脚本。第一张图描绘了向量 runLengths
长度增长 运行 次,其中条目均匀分布,最大长度为 200。
第二个图使用几乎相同的测试数据,只是它包含长度为 2000
的初始 运行。这主要影响两个 bsxfun
解决方案,而其他解决方案的性能非常相似。
测试表明
如果您想自己进行速度比较,这里是用于生成上面的绘图的代码。
function theLastRunLengthDecodingComputationComparisonYoullEverNeed()
Funcs = {@knedlsepp0, ...
@LuisMendo1bsxfun, ...
@LuisMendo2cumsum, ...
@gnovice3cumsum, ...
@Divakar4replicate_bsxfunmask, ...
@knedlsepp5cumsumaccumarray
};
%% Growing number of runs, low maximum sizes in runLengths
ns = 2.^(1:25);
paramGenerators{1} = arrayfun(@(n) @(){randi(200,n,1)}, ns,'uni',0);
paramGenerators{2} = arrayfun(@(n) @(){[2000;randi(200,n,1)]}, ns,'uni',0);
for i = 1:2
times = compareFunctions(Funcs, paramGenerators{i}, 0.5);
finishedComputations = any(~isnan(times),2);
h = figure('Visible', 'off');
loglog(ns(finishedComputations), times(finishedComputations,:));
legend(cellfun(@func2str,Funcs,'uni',0),'Location','NorthWest','Interpreter','none');
title('Runtime comparison for run length decoding - Growing number of runs');
xlabel('length(runLengths)'); ylabel('seconds');
print(['-f',num2str(h)],'-dpng','-r100',['RunLengthComparsion',num2str(i)]);
end
end
function times = compareFunctions(Funcs, paramGenerators, timeLimitInSeconds)
if nargin<3
timeLimitInSeconds = Inf;
end
times = zeros(numel(paramGenerators),numel(Funcs));
for i = 1:numel(paramGenerators)
Params = feval(paramGenerators{i});
for j = 1:numel(Funcs)
if max(times(:,j))<timeLimitInSeconds
times(i,j) = timeit(@()feval(Funcs{j},Params{:}));
else
times(i,j) = NaN;
end
end
end
end
%% // #################################
%% // HERE COME ALL THE FANCY FUNCTIONS
%% // #################################
function V = knedlsepp0(runLengths, values)
[~,V] = histc(1:sum(runLengths), cumsum([1,runLengths(:).']));%'
if nargin>1
V = reshape(values(V), 1, []);
end
V = shiftdim(V, ~isrow(runLengths));
end
%% // #################################
function V = LuisMendo1bsxfun(runLengths, values)
nn = 1:numel(runLengths);
if nargin==1 %// handle one-input case
values = nn;
end
V = values(nonzeros(bsxfun(@times, nn,...
bsxfun(@le, (1:max(runLengths)).', runLengths(:).'))));
if size(runLengths,1)~=size(values,1) %// adjust orientation of output vector
V = V.'; %'
end
end
%% // #################################
function V = LuisMendo2cumsum(runLengths, values)
if nargin==1 %// handle one-input case
values = 1:numel(runLengths);
end
[ii, ~, jj] = find(runLengths(:));
V(cumsum(jj(end:-1:1))) = 1;
V = values(ii(cumsum(V(end:-1:1))));
if size(runLengths,1)~=size(values,1) %// adjust orientation of output vector
V = V.'; %'
end
end
%% // #################################
function V = gnovice3cumsum(runLengths, values)
isColumnVector = size(runLengths,1)>1;
if nargin==1 %// handle one-input case
values = 1:numel(runLengths);
end
values = reshape(values(runLengths~=0),1,[]);
if isempty(values) %// If there are no runs
V = []; return;
end
runLengths = nonzeros(runLengths(:));
index = zeros(1,sum(runLengths));
index(cumsum([1;runLengths(1:end-1)])) = 1;
V = values(cumsum(index));
if isColumnVector %// adjust orientation of output vector
V = V.'; %'
end
end
%% // #################################
function V = Divakar4replicate_bsxfunmask(runLengths, values)
if nargin==1 %// Handle one-input case
values = 1:numel(runLengths);
end
%// Do size checking to make sure that both values and runlengths are row vectors.
if size(values,1) > 1
values = values.'; %//'
end
if size(runLengths,1) > 1
yes_transpose_output = false;
runLengths = runLengths.'; %//'
else
yes_transpose_output = true;
end
maxlen = max(runLengths);
all_values = repmat(values,maxlen,1);
%// OR all_values = values(ones(1,maxlen),:);
V = all_values(bsxfun(@le,(1:maxlen)',runLengths)); %//'
%// Bring the shape of V back to the shape of runlengths
if yes_transpose_output
V = V.'; %//'
end
end
%% // #################################
function V = knedlsepp5cumsumaccumarray(runLengths, values)
isRowVector = size(runLengths,2)>1;
%// Actual computation using column vectors
V = cumsum(accumarray(cumsum([1; runLengths(:)]), 1));
V = V(1:end-1);
%// In case of second argument
if nargin>1
V = reshape(values(V),[],1);
end
%// If original was a row vector, transpose
if isRowVector
V = V.'; %'
end
end
此处介绍的解决方案基本上分两步完成 run-length decoding
-
- 复制所有
values
到最大数量runLengths
。 - 使用
bsxfun
的屏蔽功能从每一列 select 对应runlengths
。
函数代码中的其余内容将负责输入和输出大小以满足问题中设置的要求。
接下来列出的函数代码是 one of my previous answers to a similar problem 的 "cleaned-up" 版本。这是代码 -
function V = replicate_bsxfunmask(runLengths, values)
if nargin==1 %// Handle one-input case
values = 1:numel(runLengths);
end
%// Do size checking to make sure that both values and runlengths are row vectors.
if size(values,1) > 1
values = values.'; %//'
end
if size(runLengths,1) > 1
yes_transpose_output = false;
runLengths = runLengths.'; %//'
else
yes_transpose_output = true;
end
maxlen = max(runLengths);
all_values = repmat(values,maxlen,1);
%// OR all_values = values(ones(1,maxlen),:);
V = all_values(bsxfun(@le,(1:maxlen)',runLengths)); %//'
%// Bring the shape of V back to the shape of runlengths
if yes_transpose_output
V = V.'; %//'
end
return;
接下来列出的代码将是一个混合代码 (cumsum
+ replicate_bsxfunmask
),当您有大量异常值或非常大的异常值时将适用。同样为了简单起见,目前这仅适用于数字数组。这是实现 -
function out = replicate_bsxfunmask_v2(runLengths, values)
if nargin==1 %// Handle one-input case
values = 1:numel(runLengths);
end
if size(values,1) > 1
values = values.'; %//'
end
if size(runLengths,1) > 1
yes_transpose_output = true;
runLengths = runLengths.'; %//'
else
yes_transpose_output = false;
end
%// Regularize inputs
values = values(runLengths>0);
runLengths = runLengths(runLengths>0);
%// Main portion of code
thresh = 200; %// runlengths threshold that are to be processed with cumsum
crunLengths = cumsum(runLengths); %%// cumsums of runlengths
mask = runLengths >= thresh; %// mask of runlengths above threshold
starts = [1 crunLengths(1:end-1)+1]; %// starts of each group of runlengths
mask_ind = find(mask); %// indices of mask
post_mark = starts(mask);
negt_mark = crunLengths(mask)+1;
if ~isempty(negt_mark) && negt_mark(end) > crunLengths(end)
negt_mark(end) = [];
end
%// Create array & set starts markers for starts of runlengths above thresh
marked_out = zeros(1,crunLengths(end));
marked_out(post_mark) = mask_ind;
marked_out(negt_mark) = marked_out(negt_mark) -1*mask_ind(1:numel(negt_mark));
%// Setup output array with the cumsumed version of marked array
out = cumsum(marked_out);
%// Mask for final ouput to decide between large and small runlengths
thresh_mask = out~=0;
%// Fill output array with cumsum and then rep-bsxfun based approaches
out(thresh_mask) = values(out(thresh_mask));
values = values(~mask);
runLengths = runLengths(~mask);
maxlen = max(runLengths);
all_values = repmat(values,maxlen,1);
out(~thresh_mask) = all_values(bsxfun(@le,(1:maxlen)',runLengths)); %//'
if yes_transpose_output
out = out.'; %//'
end
return;
从 R2015a 开始,函数 repelem
是执行此操作的最佳选择:
function V = runLengthDecode(runLengths, values)
if nargin<2
values = 1:numel(runLengths);
end
V = repelem(values, runLengths);
end
对于 R2015a 之前的版本,这是一个快速的替代方案:
替代repelem
:
我觉得 gnovice 的方法可以通过使用比预处理掩码更好的零运行长度修复来改进。
所以我试了一下 accumarray
。看来这给了解决方案又一个提升:
function V = runLengthDecode(runLengths, values)
%// Actual computation using column vectors
V = cumsum(accumarray(cumsum([1; runLengths(:)]), 1));
V = V(1:end-1);
%// In case of second argument
if nargin>1
V = reshape(values(V),[],1);
end
%// If original was a row vector, transpose
if size(runLengths,2)>1
V = V.'; %'
end
end