如何用在另一个数组中找到的那些值的索引替换 Python NumPy 数组中的值？

Question

我有一个 n*m 数组“a”和另一个一维数组“b”，如下所示：

a = array([[ 51, 30, 20, 10],
           [ 10, 32, 65, 77],
           [ 15, 20, 77, 30]])

b = array([10, 15, 20, 30, 32, 51, 65, 77])

我想将“a”中的所有元素替换为该元素所在的“b”的相应索引。在上述情况下，我希望输出为：

a = array([[ 5, 3, 2, 0],
           [ 0, 4, 6, 7],
           [ 1, 2, 7, 3]])

请注意，在实际应用中，我的数组很大，超过 30k 个元素，甚至有数千个。我试过 for 循环，但这些需要很长时间来计算。我也尝试过类似的迭代方法，并使用 list.index() 来获取索引，但这也需要太多时间。

谁能帮我先确定出现在“b”中的“a”元素的“b”索引，然后构造更新后的“a”数组？

谢谢。

Answer 1

如果 a, b 的 minimal/maximal 元素形成一个小范围（或至少小到足以放入 RAM），这可以使用查找 table 非常快速地完成：

a = np.array([[51, 30, 20, 10],
              [10, 32, 65, 77],
              [15, 20, 77, 30]])
b = np.array([10, 15, 20, 30, 32, 51, 65, 77])

lo = min(a.min(), b.min())
hi = max(a.max(), b.max())
lut = np.zeros(hi - lo + 1, dtype=np.int64)
lut[b - lo] = np.arange(len(b))

然后：

>>> a_indices = lut[a - lo]
>>> a_indices
array([[5, 3, 2, 0],
       [0, 4, 6, 7],
       [1, 2, 7, 3]])

Answer 2

仅作为答案发布，因为它对于评论来说太长了。它支持上面发布的 orlp 的解决方案。 Numpy 的 vectorize 避免了显式循环，但它显然不是最好的方法。请注意，Numpy 的 searchsorted 只能在 b 排序时应用，如图所示。

import timeit
import numpy as np

a = np.random.randint(1,100,(1000,1000))
b = np.arange(0,1000,1)

def o1():
    lo = min(a.min(), b.min())
    hi = max(a.max(), b.max())
    lut = np.zeros(hi - lo + 1, dtype=np.int64)
    lut[b - lo] = np.arange(len(b))
    a2 = lut[a - lo]
    return a2 

def o2():
    a2 = a.copy()
    fu = np.vectorize(lambda i: np.place(a2, a2==b[i], i))
    fu(np.arange(0,len(b),1))

print(timeit.timeit("np.searchsorted(b, a)", globals=globals(), number=2))
print(timeit.timeit("o1()", globals=globals(), number=2))
print(timeit.timeit("o2()", globals=globals(), number=2))

打印

0.061956800000189105
0.012765400000716909
2.220097600000372

如何用在另一个数组中找到的那些值的索引替换 Python NumPy 数组中的值？

How can I replace values in a Python NumPy array with the index of those values found in another array?

python

arrays

indexing

largenumber

indices