在 python 中重新标记数组元素或使元素连续的快速方法
Fast way of relabeling array elements or making elements contiguous in python
我有一个巨大的 3d 阵列要处理。我想按以下方式重新标记元素
import numpy as np
given_array = np.array([1, 1, 1, 3, 3, 5, 5, 5, 8, 8, 8, 8, 8, 23, 23, 23])
required_array = np.array([0, 0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4])
我知道 skimage.segmentation
中有 relabel_sequential
方法,但它对我来说很慢。任何以快速方式执行此操作的想法将不胜感激。
试试这个,看看它是否足够快。使用 numpy.unique
返回的 inverse
和参数 return_inverse=True
:
In [52]: given_array = np.array([1, 1, 1, 3, 3, 5, 5, 5, 8, 8, 8, 8, 8, 23, 23, 23])
In [53]: u, inv = np.unique(given_array, return_inverse=True)
In [54]: inv
Out[54]: array([0, 0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4])
最快的方法应该是编写一个特定的 numba 函数,该函数可以满足您的需求。
例子
from numba import njit
import numpy as np
@njit()
def relabel(array):
i = 0
n = -1
previous = 0
while i < len(array):
if previous != array[i]:
previous = array[i]
n += 1
array[i] = n
i += 1
given_array = np.array([1, 1, 1, 3, 3, 5, 5, 5, 8, 8, 8, 8, 8, 23, 23, 23])
relabel(given_array)
given_array
输出
array([0, 0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4])
这个例子对输入做了很多假设,即数组是排序的,第一个数字是正数,是一维形状,你想覆盖数组。
如果给定数组未排序,这将比排序更快:
from numba import njit
import numpy as np
@njit()
def relabel_fast(array, count):
i = 0
while i < len(array):
data = array[i]
count[data] += 1
i += 1
a = 1 # Position in count
b = 0 # Position in array
c = 0 # The current output number
while a < len(count):
d = 0 # The number of 'c' to output
if count[a] > 0:
while d < count[a]:
array[b] = c
b += 1
d += 1
c += 1
a += 1
def relabel(given_array):
# Arrays cannot be created within Numba, so create the count array before calling the Numba function
count = np.zeros(np.max(given_array) + 1, dtype=int)
relabel_fast(given_array, count)
#given_array = np.array([1, 1, 1, 3, 3, 5, 5, 5, 8, 8, 8, 8, 8, 23, 23, 23])
given_array = np.array([1, 23, 1, 3, 8, 3, 5, 5, 8, 8, 8, 5, 8, 23, 23, 1])
relabel(given_array)
given_array
输出
array([0, 0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4])
我有一个巨大的 3d 阵列要处理。我想按以下方式重新标记元素
import numpy as np
given_array = np.array([1, 1, 1, 3, 3, 5, 5, 5, 8, 8, 8, 8, 8, 23, 23, 23])
required_array = np.array([0, 0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4])
我知道 skimage.segmentation
中有 relabel_sequential
方法,但它对我来说很慢。任何以快速方式执行此操作的想法将不胜感激。
试试这个,看看它是否足够快。使用 numpy.unique
返回的 inverse
和参数 return_inverse=True
:
In [52]: given_array = np.array([1, 1, 1, 3, 3, 5, 5, 5, 8, 8, 8, 8, 8, 23, 23, 23])
In [53]: u, inv = np.unique(given_array, return_inverse=True)
In [54]: inv
Out[54]: array([0, 0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4])
最快的方法应该是编写一个特定的 numba 函数,该函数可以满足您的需求。
例子
from numba import njit
import numpy as np
@njit()
def relabel(array):
i = 0
n = -1
previous = 0
while i < len(array):
if previous != array[i]:
previous = array[i]
n += 1
array[i] = n
i += 1
given_array = np.array([1, 1, 1, 3, 3, 5, 5, 5, 8, 8, 8, 8, 8, 23, 23, 23])
relabel(given_array)
given_array
输出
array([0, 0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4])
这个例子对输入做了很多假设,即数组是排序的,第一个数字是正数,是一维形状,你想覆盖数组。
如果给定数组未排序,这将比排序更快:
from numba import njit
import numpy as np
@njit()
def relabel_fast(array, count):
i = 0
while i < len(array):
data = array[i]
count[data] += 1
i += 1
a = 1 # Position in count
b = 0 # Position in array
c = 0 # The current output number
while a < len(count):
d = 0 # The number of 'c' to output
if count[a] > 0:
while d < count[a]:
array[b] = c
b += 1
d += 1
c += 1
a += 1
def relabel(given_array):
# Arrays cannot be created within Numba, so create the count array before calling the Numba function
count = np.zeros(np.max(given_array) + 1, dtype=int)
relabel_fast(given_array, count)
#given_array = np.array([1, 1, 1, 3, 3, 5, 5, 5, 8, 8, 8, 8, 8, 23, 23, 23])
given_array = np.array([1, 23, 1, 3, 8, 3, 5, 5, 8, 8, 8, 5, 8, 23, 23, 1])
relabel(given_array)
given_array
输出
array([0, 0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4])