更快地连接不同大小的元胞数组
Faster concatenation of cell arrays of different sizes
我有一个大小为 m x 1 的元胞数组,每个元胞也是 s x t 元胞数组(大小各不相同)。我想垂直连接。代码如下:
function(cell_out) = vert_cat(cell_in)
[row,col] = cellfun(@size,cell_in,'Uni',0);
fcn_vert = @(x)([x,repmat({''},size(x,1),max(cell2mat(col))-size(x,2))]);
cell_out = cellfun(fcn_vert,cell_in,'Uni',0); % Taking up lot of time
cell_out = vertcat(cell_out{:});
end
第 3 步需要很多时间。这是正确的做法还是有其他更快的方法来实现这一目标?
我用这段代码生成数据:
%generating some dummy data
m=1000;
s=100;
t=100;
cell_in=cell(m,1);
for idx=1:m
cell_in{idx}=cell(randi(s),randi(t));
end
应用一些小的修改,我能够将代码加速 5 倍
%Minor modifications of the original code
%use arrays instead of cells for row and col
[row,col] = cellfun(@size,cell_in);
%claculate max(col) once
tcol=max(col);
%use cell instead of repmat to generate an empty cell
fcn_vert = @(x)([x,cell(size(x,1),tcol-size(x,2))]);
cell_out = cellfun(fcn_vert,cell_in,'Uni',0); % Taking up lot of time
cell_out = vertcat(cell_out{:});
简单地使用 for 循环会更快,因为数据只移动一次
%new approac. Basic idea: move every data only once
[row,col] = cellfun(@size,cell_in);
trow=sum(row);
tcol=max(col);
r=1;
cell_out2 = cell(trow,tcol);
for idx=1:numel(cell_in)
cell_out2(r:r+row(idx)-1,1:col(idx))=cell_in{idx};
r=r+row(idx);
end
cellfun
被发现是 slower than loops(有点旧,但与我所看到的一致)。
此外,repmat
过去也曾受到性能影响(尽管现在可能有所不同)。
试试这个旨在完成您的任务的双循环代码:
function cellOut = vert_cat(c)
nElem = length(c);
colPad = zeros(nElem,1);
nRow = zeros(nElem,1);
for k = 1:nElem
[nRow(k),colPad(k)] = size(c{k});
end
colMax = max(colPad);
colPad = colMax - colPad;
cellOut = cell(sum(nRow),colMax);
bottom = cumsum(nRow) - nRow + 1;
top = bottom + nRow - 1;
for k = 1:nElem
cellOut(bottom(k):top(k),:) = [c{k},cell(nRow(k),colPad(k))];
end
end
我对这段代码的测试是
A = rand(20,20);
A = mat2cell(A,ones(20,1),ones(20,1));
C = arrayfun(@(c) A(1:c,1:c),randi([1,15],1,5),'UniformOutput',false);
ccat = vert_cat(c);
我有一个大小为 m x 1 的元胞数组,每个元胞也是 s x t 元胞数组(大小各不相同)。我想垂直连接。代码如下:
function(cell_out) = vert_cat(cell_in)
[row,col] = cellfun(@size,cell_in,'Uni',0);
fcn_vert = @(x)([x,repmat({''},size(x,1),max(cell2mat(col))-size(x,2))]);
cell_out = cellfun(fcn_vert,cell_in,'Uni',0); % Taking up lot of time
cell_out = vertcat(cell_out{:});
end
第 3 步需要很多时间。这是正确的做法还是有其他更快的方法来实现这一目标?
我用这段代码生成数据:
%generating some dummy data
m=1000;
s=100;
t=100;
cell_in=cell(m,1);
for idx=1:m
cell_in{idx}=cell(randi(s),randi(t));
end
应用一些小的修改,我能够将代码加速 5 倍
%Minor modifications of the original code
%use arrays instead of cells for row and col
[row,col] = cellfun(@size,cell_in);
%claculate max(col) once
tcol=max(col);
%use cell instead of repmat to generate an empty cell
fcn_vert = @(x)([x,cell(size(x,1),tcol-size(x,2))]);
cell_out = cellfun(fcn_vert,cell_in,'Uni',0); % Taking up lot of time
cell_out = vertcat(cell_out{:});
简单地使用 for 循环会更快,因为数据只移动一次
%new approac. Basic idea: move every data only once
[row,col] = cellfun(@size,cell_in);
trow=sum(row);
tcol=max(col);
r=1;
cell_out2 = cell(trow,tcol);
for idx=1:numel(cell_in)
cell_out2(r:r+row(idx)-1,1:col(idx))=cell_in{idx};
r=r+row(idx);
end
cellfun
被发现是 slower than loops(有点旧,但与我所看到的一致)。
此外,repmat
过去也曾受到性能影响(尽管现在可能有所不同)。
试试这个旨在完成您的任务的双循环代码:
function cellOut = vert_cat(c)
nElem = length(c);
colPad = zeros(nElem,1);
nRow = zeros(nElem,1);
for k = 1:nElem
[nRow(k),colPad(k)] = size(c{k});
end
colMax = max(colPad);
colPad = colMax - colPad;
cellOut = cell(sum(nRow),colMax);
bottom = cumsum(nRow) - nRow + 1;
top = bottom + nRow - 1;
for k = 1:nElem
cellOut(bottom(k):top(k),:) = [c{k},cell(nRow(k),colPad(k))];
end
end
我对这段代码的测试是
A = rand(20,20);
A = mat2cell(A,ones(20,1),ones(20,1));
C = arrayfun(@(c) A(1:c,1:c),randi([1,15],1,5),'UniformOutput',false);
ccat = vert_cat(c);