从 Python 数组中删除完全隔离的单元格？

Question

我正在尝试通过移除所有完全隔离的单个单元格来减少二进制 python 数组中的噪声，即将“1”值单元格设置为 0，如果它们完全被其他“0”包围。我已经能够通过使用循环删除大小等于 1 的 blob 来获得有效的解决方案，但这对于大型数组来说似乎是一个非常低效的解决方案：

import numpy as np
import scipy.ndimage as ndimage
import matplotlib.pyplot as plt    

# Generate sample data
square = np.zeros((32, 32))
square[10:-10, 10:-10] = 1
np.random.seed(12)
x, y = (32*np.random.random((2, 20))).astype(np.int)
square[x, y] = 1

# Plot original data with many isolated single cells
plt.imshow(square, cmap=plt.cm.gray, interpolation='nearest')

# Assign unique labels
id_regions, number_of_ids = ndimage.label(square, structure=np.ones((3,3)))

# Set blobs of size 1 to 0
for i in xrange(number_of_ids + 1):
    if id_regions[id_regions==i].size == 1:
        square[id_regions==i] = 0

# Plot desired output, with all isolated single cells removed
plt.imshow(square, cmap=plt.cm.gray, interpolation='nearest')

在这种情况下，侵蚀和扩大我的数组将不起作用，因为它还会删除宽度为 1 的特征。我觉得解决方案位于 scipy.ndimage包，但至今没能破解。任何帮助将不胜感激！

Answer 1

您可以手动检查邻居并使用矢量化避免循环。

has_neighbor = np.zeros(square.shape, bool)
has_neighbor[:, 1:] = np.logical_or(has_neighbor[:, 1:], square[:, :-1] > 0)  # left
has_neighbor[:, :-1] = np.logical_or(has_neighbor[:, :-1], square[:, 1:] > 0)  # right
has_neighbor[1:, :] = np.logical_or(has_neighbor[1:, :], square[:-1, :] > 0)  # above
has_neighbor[:-1, :] = np.logical_or(has_neighbor[:-1, :], square[1:, :] > 0)  # below

square[np.logical_not(has_neighbor)] = 0

这种在正方形上循环的方式是由 numpy 在内部执行的，这比在 python 中循环更有效。此解决方案有两个缺点：

如果您的数组非常稀疏，可能有更有效的方法来检查非零点的邻域。
如果您的数组非常大，has_neighbor 数组可能会占用太多内存。在这种情况下，您可以遍历较小尺寸的子数组（python 循环和矢量化之间的权衡）。

我没有使用 ndimage 的经验，所以可能有更好的内置解决方案。

Answer 2

在图像处理中去除孤立像素的典型方法是 morphological opening, for which you have a ready-made implementation in scipy.ndimage.morphology.binary_opening。不过，这也会影响较大区域的轮廓。

至于 DIY 解决方案，我会使用 summed area table 来计算每个 3x3 子图像中的项目数，从中减去中心像素的值，然后将结果所在的所有中心点归零归零。要正确处理边界，首先用零填充数组：

sat = np.pad(square, pad_width=1, mode='constant', constant_values=0)
sat = np.cumsum(np.cumsum(sat, axis=0), axis=1)
sat = np.pad(sat, ((1, 0), (1, 0)), mode='constant', constant_values=0)
# These are all the possible overlapping 3x3 windows sums
sum3x3 = sat[3:, 3:] + sat[:-3, :-3] - sat[3:, :-3] - sat[:-3, 3:]
# This takes away the central pixel value
sum3x3 -= square
# This zeros all the isolated pixels
square[sum3x3 == 0] = 0

上面的实现有效，但没有特别注意不创建中间数组，因此您可以通过充分重构来缩短一些执行时间。

Answer 3

迟来的感谢 Jaime 和 Kazemakase 的回复。手动邻域检查方法确实删除了所有孤立的补丁，但也删除了一个角（即示例数组中正方形的右上角）附加到其他补丁的补丁。总面积 table 完美运行，在小样本阵列上速度非常快，但在较大的阵列上变慢。

我最终采用了一种使用 ndimage 的方法，该方法似乎对非常大和稀疏的数组有效（5000 x 5000 数组为 0.91 秒，总面积 table 方法为 1.17 秒）。我首先为每个离散区域生成一个标记的唯一 ID 数组，计算每个 ID 的大小，屏蔽大小数组以仅关注大小 == 1 blob，然后索引原始数组并将大小 == 1 设置为 0 :

def filter_isolated_cells(array, struct):
    """ Return array with completely isolated single cells removed
    :param array: Array with completely isolated single cells
    :param struct: Structure array for generating unique regions
    :return: Array with minimum region size > 1
    """

    filtered_array = np.copy(array)
    id_regions, num_ids = ndimage.label(filtered_array, structure=struct)
    id_sizes = np.array(ndimage.sum(array, id_regions, range(num_ids + 1)))
    area_mask = (id_sizes == 1)
    filtered_array[area_mask[id_regions]] = 0
    return filtered_array

# Run function on sample array
filtered_array = filter_isolated_cells(square, struct=np.ones((3,3)))

# Plot output, with all isolated single cells removed
plt.imshow(filtered_array, cmap=plt.cm.gray, interpolation='nearest')

结果：

从 Python 数组中删除完全隔离的单元格？

Removing completely isolated cells from Python array?

python

numpy

python-2.6

scipy

ndimage