如何在保持相对顺序的同时有效地洗牌 numpy 数组的某些值?

How to efficiently shuffle some values of a numpy array while keeping their relative order?

我有一个 numpy 数组和一个掩码,指定该数组中的哪些条目要洗牌,同时保持它们的相对顺序。让我们举个例子:

In [2]: arr = np.array([5, 3, 9, 0, 4, 1])

In [4]: mask = np.array([True, False, False, False, True, True])

In [5]: arr[mask]
Out[5]: array([5, 4, 1]) # These entries shall be shuffled inside arr, while keeping their order.

In [6]: np.where(mask==True)
Out[6]: (array([0, 4, 5]),)

In [7]: shuffle_array(arr, mask)  # I'm looking for an efficient realization of this function!
Out[7]: array([3, 5, 4, 9, 0, 1]) # See how the entries 5, 4 and 1 haven't changed their order.

我已经写了一些代码可以做到这一点,但它真的很慢。

import numpy as np
def shuffle_array(arr, mask):
    perm = np.arange(len(arr))  # permutation array
    n = mask.sum()
    if n > 0:
        old_true_pos = np.where(mask == True)[0]  # old positions for which mask is True
        old_false_pos = np.where(mask == False)[0] # old positions for which mask is False

        new_true_pos = np.random.choice(perm, n, replace=False)  # draw new positions
        new_true_pos.sort()
        new_false_pos = np.setdiff1d(perm, new_true_pos)

        new_pos = np.hstack((new_true_pos, new_false_pos))
        old_pos = np.hstack((old_true_pos, old_false_pos))
        perm[new_pos] = perm[old_pos]

    return arr[perm]

更糟糕的是,我实际上有两个形状为 (M,N) 的大矩阵 A 和 B。矩阵 A 包含任意值,而矩阵 B 的每一行都是掩码,用于根据我上面概述的过程对矩阵 A 的相应行进行改组。所以我想要的是shuffled_matrix = row_wise_shuffle(A, B).

到目前为止,我找到的唯一方法是通过 shuffle_array() 函数和 for 循环。

你能想出任何 numpy'onic 方法来完成这个避免循环的任务吗?提前致谢!

对于 1d 个案例:

import numpy as np

a = np.arange(8)
b = np.array([1,1,1,1,0,0,0,0])
# Get ordered values
ordered_values = a[np.where(b==1)]
# We'll shuffle both arrays
shuffled_ix = np.random.permutation(a.shape[0])
a_shuffled = a[shuffled_ix]
b_shuffled = b[shuffled_ix]
# Replace the values with correct order
a_shuffled[np.where(b_shuffled==1)] = ordered_values
a_shuffled # Notice that 0, 1, 2, 3 preserves order.

>>>
array([0, 1, 2, 6, 3, 4, 7, 5])

对于二维情况,按列随机播放(沿轴=1):


import numpy as np

a = np.arange(24).reshape(4,6)
b = np.array([[0,0,0,0,1,1], [1,1,1,0,0,0], [1,1,1,1,0,0], [0,0,1,1,0,0]])

# The code below works for column shuffle (i.e. axis=1).
# Get ordered values
i,j = np.where(b==1)
values = a[i, j]
values

# We'll shuffle both arrays for axis=1
# taken from 
idx = np.random.rand(*a.shape).argsort(axis=1)
a_shuffled = np.take_along_axis(a,idx,axis=1)
b_shuffled = np.take_along_axis(b,idx,axis=1)

# Replace the values with correct order
a_shuffled[np.where(b_shuffled==1)] = values

# Get the result
a_shuffled # see that 4,5 | 6,7,8 | 12,13,14,15 | 20, 21 preserves order
>>>
array([[ 4,  1,  0,  3,  2,  5],
       [ 9,  6,  7, 11,  8, 10],
       [12, 13, 16, 17, 14, 15],
       [23, 20, 19, 22, 21, 18]])

对于二维情况,按行随机播放(沿轴=0),我们可以使用相同的代码,首先转置数组,然后随机转回:


import numpy as np

a = np.arange(24).reshape(4,6)
b = np.array([[0,0,0,0,1,1], [1,1,1,0,0,0], [1,1,1,1,0,0], [0,0,1,1,0,0]])

# The code below works for column shuffle (i.e. axis=1).
# As you said rowwise, we first transpose
at = a.T
bt = b.T

# Get ordered values
i,j = np.where(bt==1)
values = at[i, j]
values

# We'll shuffle both arrays for axis=1
# taken from 
idx = np.random.rand(*at.shape).argsort(axis=1)
at_shuffled = np.take_along_axis(at,idx,axis=1)
bt_shuffled = np.take_along_axis(bt,idx,axis=1)

# Replace the values with correct order
at_shuffled[np.where(bt_shuffled==1)] = values

# Get the result
a_shuffled = at_shuffled.T
a_shuffled # see that 6,12 | 7, 13 | 8,14,20 | 15, 21 preserves order
>>>
array([[ 6,  7,  2,  3, 10, 17],
       [18, 19,  8, 15, 16, 23],
       [12, 13, 14, 21,  4,  5],
       [ 0,  1, 20,  9, 22, 11]])