更快地连接不同大小的元胞数组

Question

我有一个大小为 m x 1 的元胞数组，每个元胞也是 s x t 元胞数组（大小各不相同）。我想垂直连接。代码如下：

function(cell_out) = vert_cat(cell_in)
    [row,col] = cellfun(@size,cell_in,'Uni',0);
    fcn_vert = @(x)([x,repmat({''},size(x,1),max(cell2mat(col))-size(x,2))]);
    cell_out = cellfun(fcn_vert,cell_in,'Uni',0); % Taking up lot of time
    cell_out = vertcat(cell_out{:});
end

第 3 步需要很多时间。这是正确的做法还是有其他更快的方法来实现这一目标？

Answer 1

我用这段代码生成数据：

%generating some dummy data
m=1000;
s=100;
t=100;
cell_in=cell(m,1);
for idx=1:m
    cell_in{idx}=cell(randi(s),randi(t));
end

应用一些小的修改，我能够将代码加速 5 倍

%Minor modifications of the original code
    %use arrays instead of cells for row and col
    [row,col] = cellfun(@size,cell_in);
    %claculate max(col) once
    tcol=max(col);
    %use cell instead of repmat to generate an empty cell
    fcn_vert = @(x)([x,cell(size(x,1),tcol-size(x,2))]);
    cell_out = cellfun(fcn_vert,cell_in,'Uni',0); % Taking up lot of time
    cell_out = vertcat(cell_out{:});

简单地使用 for 循环会更快，因为数据只移动一次

%new approac. Basic idea: move every data only once
    [row,col] = cellfun(@size,cell_in);
    trow=sum(row);
    tcol=max(col);
    r=1;
    cell_out2 = cell(trow,tcol);
    for idx=1:numel(cell_in)
        cell_out2(r:r+row(idx)-1,1:col(idx))=cell_in{idx};
        r=r+row(idx);
    end

Answer 2

cellfun 被发现是 slower than loops（有点旧，但与我所看到的一致）。此外，repmat 过去也曾受到性能影响（尽管现在可能有所不同）。试试这个旨在完成您的任务的双循环代码：

function cellOut = vert_cat(c)

    nElem  = length(c);
    colPad = zeros(nElem,1);
    nRow   = zeros(nElem,1);
    for k = 1:nElem
        [nRow(k),colPad(k)] = size(c{k});
    end
    colMax = max(colPad);
    colPad = colMax - colPad;

    cellOut = cell(sum(nRow),colMax);
    bottom  = cumsum(nRow) - nRow + 1;
    top     = bottom + nRow - 1;
    for k = 1:nElem
        cellOut(bottom(k):top(k),:) = [c{k},cell(nRow(k),colPad(k))];
    end

end

我对这段代码的测试是

A = rand(20,20);
A = mat2cell(A,ones(20,1),ones(20,1));
C = arrayfun(@(c) A(1:c,1:c),randi([1,15],1,5),'UniformOutput',false);
ccat = vert_cat(c);

更快地连接不同大小的元胞数组

Faster concatenation of cell arrays of different sizes

performance

matlab

cell-array