在二进制图像中查找连接斑点坐标的有效方法

Question

我正在寻找二进制图像（0 或 1 的 2d numpy 数组）中连接的斑点的坐标。

skimage 库提供了一种非常快速的方法来标记数组中的斑点（这是我从类似的 SO 帖子中找到的）。但是我想要一个 blob 的坐标列表，而不是一个带标签的数组。我有一个从标记图像中提取坐标的解决方案。但它很慢。比初始标签慢得多。

最小可重现示例：

import timeit
from skimage import measure
import numpy as np

binary_image = np.array([
        [0,1,0,0,1,1,0,1,1,0,0,1],
        [0,1,0,1,1,1,0,1,1,1,0,1],
        [0,0,0,0,0,0,0,1,1,1,0,0],
        [0,1,1,1,1,0,0,0,0,1,0,0],
        [0,0,0,0,0,0,0,1,1,1,0,0],
        [0,0,1,0,0,0,0,0,0,0,0,0],
        [0,1,0,0,1,1,0,1,1,0,0,1],
        [0,0,0,0,0,0,0,1,1,1,0,0],
        [0,1,1,1,1,0,0,0,0,1,0,0],
        ])

print(f"\n\n2d array of type: {type(binary_image)}:")
print(binary_image)

labels = measure.label(binary_image)

print(f"\n\n2d array with connected blobs labelled of type {type(labels)}:")
print(labels)

def extract_blobs_from_labelled_array(labelled_array):
    # The goal is to obtain lists of the coordinates
    # Of each distinct blob.

    blobs = []

    label = 1
    while True:
        indices_of_label = np.where(labelled_array==label)
        if not indices_of_label[0].size > 0:
            break
        else:
            blob =list(zip(*indices_of_label))
            label+=1
            blobs.append(blob)


if __name__ == "__main__":
    print("\n\nBeginning extract_blobs_from_labelled_array timing\n")
    print("Time taken:")
    print(
        timeit.timeit(
            'extract_blobs_from_labelled_array(labels)', 
            globals=globals(),
            number=1
            )
        )
    print("\n\n")

输出：

2d array of type: <class 'numpy.ndarray'>:
[[0 1 0 0 1 1 0 1 1 0 0 1]
 [0 1 0 1 1 1 0 1 1 1 0 1]
 [0 0 0 0 0 0 0 1 1 1 0 0]
 [0 1 1 1 1 0 0 0 0 1 0 0]
 [0 0 0 0 0 0 0 1 1 1 0 0]
 [0 0 1 0 0 0 0 0 0 0 0 0]
 [0 1 0 0 1 1 0 1 1 0 0 1]
 [0 0 0 0 0 0 0 1 1 1 0 0]
 [0 1 1 1 1 0 0 0 0 1 0 0]]


2d array with connected blobs labelled of type <class 'numpy.ndarray'>:
[[ 0  1  0  0  2  2  0  3  3  0  0  4]
 [ 0  1  0  2  2  2  0  3  3  3  0  4]
 [ 0  0  0  0  0  0  0  3  3  3  0  0]
 [ 0  5  5  5  5  0  0  0  0  3  0  0]
 [ 0  0  0  0  0  0  0  3  3  3  0  0]
 [ 0  0  6  0  0  0  0  0  0  0  0  0]
 [ 0  6  0  0  7  7  0  8  8  0  0  9]
 [ 0  0  0  0  0  0  0  8  8  8  0  0]
 [ 0 10 10 10 10  0  0  0  0  8  0  0]]


Beginning extract_blobs_from_labelled_array timing

Time taken:
9.346099977847189e-05

9e-05 很小，但示例图像也很小。实际上，我正在处理非常高分辨率的图像，该函数大约需要 10 分钟。

有更快的方法吗？

旁注：我只使用 list(zip()) 来尝试将 numpy 坐标转换为我习惯的东西（我不怎么使用 numpy，只是 Python）。我应该跳过这个并只使用坐标按原样索引吗？这会加快速度吗？

Answer 1

慢的部分代码在这里：

    while True:
        indices_of_label = np.where(labelled_array==label)
        if not indices_of_label[0].size > 0:
            break
        else:
            blob =list(zip(*indices_of_label))
            label+=1
            blobs.append(blob)

首先，一个完整的旁白：当您知道要迭代的元素数量时，您应该避免使用 while True。这是难以发现的无限循环错误的秘诀。

相反，您应该使用：

    for label in range(np.max(labels)):

然后你可以忽略if ...: break。

第二个问题确实是您正在使用 list(zip(*))，与 NumPy 函数相比速度较慢。在这里你可以得到与 np.transpose(indices_of_label) 大致相同的结果，这将得到一个形状为 (n_coords, n_dim) 的二维数组，即 (n_coords, 2).

但最大的问题是表达式 labelled_array == label。这将为每个标签检查图像的每个像素一次。（实际上是两次，因为那时你运行 np.where()，它需要另一遍。）这是很多不必要的工作，因为坐标可以在一次遍中找到。

scikit-image 函数 skimage.measure.regionprops 可以为您做到这一点。 regionprops 检查图像一次，returns 每个标签包含一个 RegionProps 对象的列表。该对象有一个 .coords 属性，其中包含 blob 中每个像素的坐标。因此，这是您的代码，已修改为使用该功能：

import timeit
from skimage import measure
import numpy as np

binary_image = np.array([
        [0,1,0,0,1,1,0,1,1,0,0,1],
        [0,1,0,1,1,1,0,1,1,1,0,1],
        [0,0,0,0,0,0,0,1,1,1,0,0],
        [0,1,1,1,1,0,0,0,0,1,0,0],
        [0,0,0,0,0,0,0,1,1,1,0,0],
        [0,0,1,0,0,0,0,0,0,0,0,0],
        [0,1,0,0,1,1,0,1,1,0,0,1],
        [0,0,0,0,0,0,0,1,1,1,0,0],
        [0,1,1,1,1,0,0,0,0,1,0,0],
        ])

print(f"\n\n2d array of type: {type(binary_image)}:")
print(binary_image)

labels = measure.label(binary_image)

print(f"\n\n2d array with connected blobs labelled of type {type(labels)}:")
print(labels)

def extract_blobs_from_labelled_array(labelled_array):
    """Return a list containing coordinates of pixels in each blob."""
    props = measure.regionprops(labelled_array)
    blobs = [p.coords for p in props]
    return blobs


if __name__ == "__main__":
    print("\n\nBeginning extract_blobs_from_labelled_array timing\n")
    print("Time taken:")
    print(
        timeit.timeit(
            'extract_blobs_from_labelled_array(labels)', 
            globals=globals(),
            number=1
            )
        )
    print("\n\n")

在二进制图像中查找连接斑点坐标的有效方法

Efficient way to find coordinates of connected blobs in binary image

python

numpy

scikit-image

binary-image