多维数组/准图像的连通分量标记

Connected component labeling for arrays / quasi-images with many dimension

问题

我正尝试对 3 维以上的数组执行 connected component labling。我的意思是我的布尔数组有一个 .shape 例如像 (5,2,3,6,10) 这将是 5 个维度。

对于 2D 图像(而不是我的 >3D 问题),连接组件标记将标记连接区域(在我的例子中是超体积)。如果两个 (hpyer-) 像素彼此相邻并且在布尔数组中均为 True,则两个像素相连。

我已经尝试过的

对于 2 维这个 can be done with OpenCV and with up to 3 dimensions this can be done with scikit-image's skimage.measure.label。但是,我不确定如何处理我的情况。


进一步material感兴趣的reader(但这对我的问题没有帮助):

如果 2D 中的 4 连通性就足够了,您可以使用最近邻树在 n log n 时间内获得也是前景的相邻像素。 然后是构建图形并找到连通分量的问题(也是 n log n,IIRC)。

#!/usr/bin/env python
"""

"""
import numpy as np
import networkx as nx

from scipy.spatial import cKDTree


def get_components(boolean_array):
    # find neighbours
    coordinates = list(zip(*np.where(boolean_array)))
    tree = cKDTree(coordinates)
    neighbours_by_pixel = tree.query_ball_tree(tree, r=1, p=1) # p=1 -> Manhatten distance; r=1 -> what would be 4-connectivity in 2D

    # create graph and find components
    G = nx.Graph()
    for ii, neighbours in enumerate(neighbours_by_pixel):
        if len(neighbours) > 1:
            G.add_edges_from([(ii, jj) for jj in neighbours[1:]]) # skip first neighbour as that is a self-loop
    components = nx.connected_components(G)

    # create output image
    output = np.zeros_like(data, dtype=np.int)
    for ii, component in enumerate(components):
        for idx in component:
            output[coordinates[idx]] = ii+1

    return output


if __name__ == '__main__':

    shape = (5, 2, 3, 6, 10)
    D = len(shape)
    data = np.random.rand(*shape) < 0.1
    output = get_components(data)

对于形状为 (50, 50, 50, 50) 的数组,我在笔记本电脑上得到以下计时:

In [48]: %timeit output = get_components(data)
5.85 s ± 279 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

scipy.ndimage.label直接做你想做的事:

In [1]: import numpy as np
In [2]: arr = np.random.random((5,2,3,6,10)) > 0.5
In [3]: from scipy import ndimage as ndi
In [4]: labeled, n = ndi.label(arr)
In [5]: n
Out[5]: 11