numpy 数组中标记组件之间的最小边到边欧氏距离
Minimal edge-to-edge euclidean distance between labeled components in numpy array
我在大型 numpy
数组中有许多不同的形式,我想使用 numpy
和 scipy
计算它们之间的边到边欧氏距离。
注意:我进行了搜索,这与堆栈中之前的其他问题不同,因为我想获得数组中标记的补丁之间的最小距离,而不是点之间的距离或其他问题所问的单独数组。
我目前的方法是使用 KDTree,但对于大型数组来说效率极低。本质上,我正在查找每个标记组件的坐标并计算所有其他组件之间的距离。最后以计算平均最小距离为例
我正在寻找一种使用 python 的更智能的方法,最好不要使用任何额外的模块。
import numpy
from scipy import spatial
from scipy import ndimage
# Testing array
a = numpy.zeros((8,8), dtype=numpy.int)
a[2,2] = a[3,1] = a[3,2] = 1
a[2,6] = a[2,7] = a[1,6] = 1
a[5,5] = a[5,6] = a[6,5] = a[6,6] = a[7,5] = a[7,6] = 1
# label it
labeled_array,numpatches = ndimage.label(a)
# For number of patches
closest_points = []
for patch in [x+1 for x in range(numpatches)]:
# Get coordinates of first patch
x,y = numpy.where(labeled_array==patch)
coords = numpy.vstack((x,y)).T # transform into array
# Built a KDtree of the coords of the first patch
mt = spatial.cKDTree(coords)
for patch2 in [i+1 for i in range(numpatches)]:
if patch == patch2: # If patch is the same as the first, skip
continue
# Get coordinates of second patch
x2,y2 = numpy.where(labeled_array==patch2)
coords2 = numpy.vstack((x2,y2)).T
# Now loop through points
min_res = []
for pi in range(len(coords2)):
dist, indexes = mt.query(coords2[pi]) # query the distance and index
min_res.append([dist,pi])
m = numpy.vstack(min_res)
# Find minimum as closed point and get index of coordinates
closest_points.append( coords2[m[numpy.argmin(m,axis=0)[0]][1]] )
# The average euclidean distance can then be calculated like this:
spatial.distance.pdist(closest_points,metric = "euclidean").mean()
编辑
刚刚测试了@morningsun 提出的解决方案,这是一个巨大的速度提升。但是 returned 的值略有不同:
# Consider for instance the following array
a = numpy.zeros((8,8), dtype=numpy.int)
a[2,2] = a[2,6] = a[5,5] = 1
labeled_array, numpatches = ndimage.label(cl_array,s)
# Previous approach using KDtrees and pdist
b = kd(labeled_array,numpatches)
spatial.distance.pdist(b,metric = "euclidean").mean()
#> 3.0413115592767102
# New approach using the lower matrix and selecting only lower distances
b = numpy.tril( feature_dist(labeled_array) )
b[b == 0 ] = numpy.nan
numpy.nanmean(b)
#> 3.8016394490958878
编辑 2
啊,明白了。 spatial.distance.pdist 没有 return 合适的距离矩阵,因此值是错误的。
这是一种完全矢量化的方法来查找标记对象的距离矩阵:
import numpy as np
from scipy.spatial.distance import cdist
def feature_dist(input):
"""
Takes a labeled array as returned by scipy.ndimage.label and
returns an intra-feature distance matrix.
"""
I, J = np.nonzero(input)
labels = input[I,J]
coords = np.column_stack((I,J))
sorter = np.argsort(labels)
labels = labels[sorter]
coords = coords[sorter]
sq_dists = cdist(coords, coords, 'sqeuclidean')
start_idx = np.flatnonzero(np.r_[1, np.diff(labels)])
nonzero_vs_feat = np.minimum.reduceat(sq_dists, start_idx, axis=1)
feat_vs_feat = np.minimum.reduceat(nonzero_vs_feat, start_idx, axis=0)
return np.sqrt(feat_vs_feat)
这种方法需要 O(N2) 内存,其中 N 是非零像素的数量。如果这要求太高,您可以 "de-vectorize" 沿着一个轴(添加一个 for 循环)。
我在大型 numpy
数组中有许多不同的形式,我想使用 numpy
和 scipy
计算它们之间的边到边欧氏距离。
注意:我进行了搜索,这与堆栈中之前的其他问题不同,因为我想获得数组中标记的补丁之间的最小距离,而不是点之间的距离或其他问题所问的单独数组。
我目前的方法是使用 KDTree,但对于大型数组来说效率极低。本质上,我正在查找每个标记组件的坐标并计算所有其他组件之间的距离。最后以计算平均最小距离为例
我正在寻找一种使用 python 的更智能的方法,最好不要使用任何额外的模块。
import numpy
from scipy import spatial
from scipy import ndimage
# Testing array
a = numpy.zeros((8,8), dtype=numpy.int)
a[2,2] = a[3,1] = a[3,2] = 1
a[2,6] = a[2,7] = a[1,6] = 1
a[5,5] = a[5,6] = a[6,5] = a[6,6] = a[7,5] = a[7,6] = 1
# label it
labeled_array,numpatches = ndimage.label(a)
# For number of patches
closest_points = []
for patch in [x+1 for x in range(numpatches)]:
# Get coordinates of first patch
x,y = numpy.where(labeled_array==patch)
coords = numpy.vstack((x,y)).T # transform into array
# Built a KDtree of the coords of the first patch
mt = spatial.cKDTree(coords)
for patch2 in [i+1 for i in range(numpatches)]:
if patch == patch2: # If patch is the same as the first, skip
continue
# Get coordinates of second patch
x2,y2 = numpy.where(labeled_array==patch2)
coords2 = numpy.vstack((x2,y2)).T
# Now loop through points
min_res = []
for pi in range(len(coords2)):
dist, indexes = mt.query(coords2[pi]) # query the distance and index
min_res.append([dist,pi])
m = numpy.vstack(min_res)
# Find minimum as closed point and get index of coordinates
closest_points.append( coords2[m[numpy.argmin(m,axis=0)[0]][1]] )
# The average euclidean distance can then be calculated like this:
spatial.distance.pdist(closest_points,metric = "euclidean").mean()
编辑 刚刚测试了@morningsun 提出的解决方案,这是一个巨大的速度提升。但是 returned 的值略有不同:
# Consider for instance the following array
a = numpy.zeros((8,8), dtype=numpy.int)
a[2,2] = a[2,6] = a[5,5] = 1
labeled_array, numpatches = ndimage.label(cl_array,s)
# Previous approach using KDtrees and pdist
b = kd(labeled_array,numpatches)
spatial.distance.pdist(b,metric = "euclidean").mean()
#> 3.0413115592767102
# New approach using the lower matrix and selecting only lower distances
b = numpy.tril( feature_dist(labeled_array) )
b[b == 0 ] = numpy.nan
numpy.nanmean(b)
#> 3.8016394490958878
编辑 2
啊,明白了。 spatial.distance.pdist 没有 return 合适的距离矩阵,因此值是错误的。
这是一种完全矢量化的方法来查找标记对象的距离矩阵:
import numpy as np
from scipy.spatial.distance import cdist
def feature_dist(input):
"""
Takes a labeled array as returned by scipy.ndimage.label and
returns an intra-feature distance matrix.
"""
I, J = np.nonzero(input)
labels = input[I,J]
coords = np.column_stack((I,J))
sorter = np.argsort(labels)
labels = labels[sorter]
coords = coords[sorter]
sq_dists = cdist(coords, coords, 'sqeuclidean')
start_idx = np.flatnonzero(np.r_[1, np.diff(labels)])
nonzero_vs_feat = np.minimum.reduceat(sq_dists, start_idx, axis=1)
feat_vs_feat = np.minimum.reduceat(nonzero_vs_feat, start_idx, axis=0)
return np.sqrt(feat_vs_feat)
这种方法需要 O(N2) 内存,其中 N 是非零像素的数量。如果这要求太高,您可以 "de-vectorize" 沿着一个轴(添加一个 for 循环)。