如何从 numpy 数组中删除重叠块？

Question

我正在使用 cv2.goodFeaturesToTrack 函数来查找图像中的特征点。最终目标是提取一定大小的方块，特征点是这些块的中心。

然而，很多特征点彼此靠近，所以块重叠，这不是我想要的。

这是所有特征点（中心）的例子：

array([[3536., 1419.],
       [2976., 1024.],
       [3504., 1400.],
       [3574., 1505.],
       [3672., 1453.],
       [3671., 1442.],
       [3489., 1429.],
       [3108.,  737.]])

假设我想找到具有 blockRadius = 400 且不重叠的第一个 n 块。关于如何实现这一点有什么想法吗？

Answer 1

你需要一些迭代来做到这一点，因为像这样的经常性辍学是不可向量化的。我认为这样的东西会起作用

from scipy.spatial.distance import pdist, squareform

c = np.array([[3536., 1419.],
       [2976., 1024.],
       [3504., 1400.],
       [3574., 1505.],
       [3672., 1453.],
       [3671., 1442.],
       [3489., 1429.],
       [3108.,  737.]])

dists = squareform(pdist(c, metric = 'chebyshev'))     # distance matrix, chebyshev here since you seem to want blocks
indices = np.arange(c.shape[0])  # indices that haven't been dropped (all to start)
out = [0]                        # always want the first index
while True:
    try:
        indices = indices[dists[indices[0], indices] > 400] #drop indices that are inside threshhold
        out.append(indices[0])   # add the next index that hasn't been dropped to the output
    except:
        break   # once you run out of indices, you'll get an IndexError and you're done
print(out)
[0, 1]

让我们尝试一大堆要点：

np.random.seed(42)
c = np.random.rand(10000, 2) * 800
dists = squareform(pdist(c, metric = 'chebyshev'))     # distance matrix, checbyshev here since you seem to want squares
indices = np.arange(c.shape[0])  # indices that haven't been dropped (all to start)
out = [0]                        # always want the first index
while True:
    try:
        indices = indices[dists[indices[0], indices] > 400] #drop indices that are inside threshhold
        out.append(indices[0])   # add the next index that hasn't been dropped to the output
    except:
        break   # once you run out of indices, you'll get an IndexError and you're done
print(out, pdist(c[out], metric = 'chebyshev'))
[0, 2, 6, 17] [635.77582886 590.70015659 472.87353138 541.13920029 647.69071411
 476.84658995]

因此，4 个点（有意义，因为 4 个 400x400 块平铺了 4 个平铺的 800x800 space），大部分值较低 (17 << 10000)，并且保留点之间的距离始终 > 400

Answer 2

您可以使用 scipy.spatial.KDTree 更接近 - 尽管它不支持查询由块中不同数量的点组成的块。因此，它可以与另一个库 python-igraph 结合使用，该库可以快速找到接近点的连通分量：

from scipy.spatial import KDTree
import igraph as ig

data = np.array([[3536., 1419.],
       [2976., 1024.],
       [3504., 1400.],
       [3574., 1505.],
       [3672., 1453.],
       [3671., 1442.],
       [3489., 1429.],
       [3108.,  737.]])
edges1 = KDTree(data[:,:1]).query_pairs(r=400)
edges2 = KDTree(data[:,1:]).query_pairs(r=400)
g = ig.Graph(n = len(data), edges=edges1 & edges2)
i = g.clusters()

因此簇对应于某种内部类型的块点索引序列igraph。有一个快速预览：

>>> print(i)
Clustering with 8 elements and 2 clusters
[0] 0, 2, 3, 4, 5, 6
[1] 1, 7
>>> pal = ig.drawing.colors.ClusterColoringPalette(len(i)) #number of colors used
color = pal.get_many(i.membership) #list of color tags
ig.plot(g,  bbox = (200, 100), layout=g.layout('circle'), vertex_label=g.vs.indices, 
        vertex_color = color, vertex_size = 12, vertex_label_size = 8)

用法示例：

>>> [data[n] for n in i] #or list(i)
[array([[3536., 1419.],
        [3504., 1400.],
        [3574., 1505.],
        [3672., 1453.],
        [3671., 1442.],
        [3489., 1429.]]),
 array([[2976., 1024.],
        [3108.,  737.]])]

备注： 此方法允许使用成对的接近点而不是 n*n 矩阵，后者更有效在某些情况下内存。

如何从 numpy 数组中删除重叠块？

How to remove overlapping blocks from numpy array?

python

opencv

numpy

feature-extraction

overlapping