在 python 中重新标记数组元素或使元素连续的快速方法

Fast way of relabeling array elements or making elements contiguous in python

我有一个巨大的 3d 阵列要处理。我想按以下方式重新标记元素

import numpy as np
given_array = np.array([1, 1, 1, 3, 3, 5, 5, 5, 8, 8, 8, 8, 8, 23, 23, 23])
required_array = np.array([0, 0, 0, 1, 1,  2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4])

我知道 skimage.segmentation 中有 relabel_sequential 方法,但它对我来说很慢。任何以快速方式执行此操作的想法将不胜感激。

试试这个,看看它是否足够快。使用 numpy.unique 返回的 inverse 和参数 return_inverse=True:

In [52]: given_array = np.array([1, 1, 1, 3, 3, 5, 5, 5, 8, 8, 8, 8, 8, 23, 23, 23])             

In [53]: u, inv = np.unique(given_array, return_inverse=True)                                    

In [54]: inv                                                                                     
Out[54]: array([0, 0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4])

最快的方法应该是编写一个特定的 numba 函数,该函数可以满足您的需求。

例子

from numba import njit
import numpy as np

@njit()
def relabel(array):
    i = 0
    n = -1
    previous = 0
    while i < len(array):
        if previous != array[i]:
            previous  = array[i]
            n += 1
        array[i] = n
        i += 1

given_array = np.array([1, 1, 1, 3, 3, 5, 5, 5, 8, 8, 8, 8, 8, 23, 23, 23])
relabel(given_array)

given_array

输出

array([0, 0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4])

这个例子对输入做了很多假设,即数组是排序的,第一个数字是正数,是一维形状,你想覆盖数组。

如果给定数组未排序,这将比排序更快:

from numba import njit
import numpy as np

@njit()
def relabel_fast(array, count):
    i = 0
    while i < len(array):
        data = array[i]
        count[data] += 1
        i += 1
    a = 1 # Position in count
    b = 0 # Position in array
    c = 0 # The current output number
    while a < len(count):
        d = 0 # The number of 'c' to output
        if count[a] > 0:
            while d < count[a]:
                array[b] = c
                b += 1
                d += 1
            c += 1
        a += 1

def relabel(given_array):
    # Arrays cannot be created within Numba, so create the count array before calling the Numba function
    count = np.zeros(np.max(given_array) + 1, dtype=int)
    relabel_fast(given_array, count)


#given_array = np.array([1, 1, 1, 3, 3, 5, 5, 5, 8, 8, 8, 8, 8, 23, 23, 23])
given_array = np.array([1, 23, 1, 3, 8, 3, 5, 5, 8, 8, 8, 5, 8, 23, 23, 1])
relabel(given_array)

given_array

输出

array([0, 0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4])