在没有numpy循环的情况下获取簇中元素的坐标
Getting the coordinates of elements in clusters without a loop in numpy
我有一个二维数组,我在其中使用 ndimage.label()
函数标记簇,如下所示:
import numpy as np
from scipy.ndimage import label
input_array = np.array([[0, 1, 1, 0],
[1, 1, 0, 0],
[0, 0, 0, 1],
[0, 0, 0, 1]])
labeled_array, _ = label(input_array)
# Result:
# labeled_array == [[0, 1, 1, 0],
# [1, 1, 0, 0],
# [0, 0, 0, 2],
# [0, 0, 0, 2]]
我可以获得标记簇的元素计数、质心或边界框。但我还想获得簇中每个元素的坐标。像这样(数据结构不一定要这样,任何数据结构都可以):
{
1: [(0, 1), (0, 2), (1, 0), (1, 1)], # Coordinates of the elements that have the label "1"
2: [(2, 3), (3, 3)] # Coordinates of the elements that have the label "2"
}
我可以遍历标签列表并为它们中的每一个调用 np.where()
,但我想知道是否有一种方法可以不使用循环来执行此操作,这样速度会更快?
你可以做坐标图,排序拆分:
# Get the indexes (coordinates) of the labeled (non-zero) elements
ind = np.argwhere(labeled_array)
# Get the labels corresponding to those indexes above
labels = labeled_array[tuple(ind.T)]
# Sort both arrays so that lower label numbers appear before higher label numbers. This is not for cosmetic reasons,
# but we will use sorted nature of these label indexes when we use the "diff" method in the next step.
sort = labels.argsort()
ind = ind[sort]
labels = labels[sort]
# Find the split points where a new label number starts in the ordered label numbers
splits = np.flatnonzero(np.diff(labels)) + 1
# Create a data structure out of the label numbers and indexes (coordinates).
# The first argument to the zip is: we take the 0th label number and the label numbers at the split points
# The second argument is the indexes (coordinates), split at split points
# so the length of both arguments to the zip function is the same
result = {k: v for k, v in zip(labels[np.r_[0, splits]],
np.split(ind, splits))}
方法一:
你可以试试这个,仍然使用字典理解循环:
{k: list(zip(*np.where(labeled_array == k))) for k in range(1,3)}
输出:
{1: [(0, 1), (0, 2), (1, 0), (1, 1)], 2: [(2, 3), (3, 3)]}
方法二(慢):
这是一种使用 pandas 的方法,可能比 Mad Physicist 的方法慢:
(pd.DataFrame(labeled_array)
.stack()
.reset_index()
.groupby(0).agg(list)[1:]
.apply(lambda x: list(zip(*x)), axis=1)
).to_dict()
输出:
{1: [(0, 1), (0, 2), (1, 0), (1, 1)], 2: [(2, 3), (3, 3)]}
使用此数据的时间:
词典理解
8.73 µs ± 216 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
使用地图坐标,排序和分割:
57.3 µs ± 5.55 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
pandas
5.16 ms ± 283 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
我有一个二维数组,我在其中使用 ndimage.label()
函数标记簇,如下所示:
import numpy as np
from scipy.ndimage import label
input_array = np.array([[0, 1, 1, 0],
[1, 1, 0, 0],
[0, 0, 0, 1],
[0, 0, 0, 1]])
labeled_array, _ = label(input_array)
# Result:
# labeled_array == [[0, 1, 1, 0],
# [1, 1, 0, 0],
# [0, 0, 0, 2],
# [0, 0, 0, 2]]
我可以获得标记簇的元素计数、质心或边界框。但我还想获得簇中每个元素的坐标。像这样(数据结构不一定要这样,任何数据结构都可以):
{
1: [(0, 1), (0, 2), (1, 0), (1, 1)], # Coordinates of the elements that have the label "1"
2: [(2, 3), (3, 3)] # Coordinates of the elements that have the label "2"
}
我可以遍历标签列表并为它们中的每一个调用 np.where()
,但我想知道是否有一种方法可以不使用循环来执行此操作,这样速度会更快?
你可以做坐标图,排序拆分:
# Get the indexes (coordinates) of the labeled (non-zero) elements
ind = np.argwhere(labeled_array)
# Get the labels corresponding to those indexes above
labels = labeled_array[tuple(ind.T)]
# Sort both arrays so that lower label numbers appear before higher label numbers. This is not for cosmetic reasons,
# but we will use sorted nature of these label indexes when we use the "diff" method in the next step.
sort = labels.argsort()
ind = ind[sort]
labels = labels[sort]
# Find the split points where a new label number starts in the ordered label numbers
splits = np.flatnonzero(np.diff(labels)) + 1
# Create a data structure out of the label numbers and indexes (coordinates).
# The first argument to the zip is: we take the 0th label number and the label numbers at the split points
# The second argument is the indexes (coordinates), split at split points
# so the length of both arguments to the zip function is the same
result = {k: v for k, v in zip(labels[np.r_[0, splits]],
np.split(ind, splits))}
方法一:
你可以试试这个,仍然使用字典理解循环:
{k: list(zip(*np.where(labeled_array == k))) for k in range(1,3)}
输出:
{1: [(0, 1), (0, 2), (1, 0), (1, 1)], 2: [(2, 3), (3, 3)]}
方法二(慢):
这是一种使用 pandas 的方法,可能比 Mad Physicist 的方法慢:
(pd.DataFrame(labeled_array)
.stack()
.reset_index()
.groupby(0).agg(list)[1:]
.apply(lambda x: list(zip(*x)), axis=1)
).to_dict()
输出:
{1: [(0, 1), (0, 2), (1, 0), (1, 1)], 2: [(2, 3), (3, 3)]}
使用此数据的时间:
词典理解
8.73 µs ± 216 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
使用地图坐标,排序和分割:
57.3 µs ± 5.55 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
pandas
5.16 ms ± 283 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)