根据阈值在数组中每行应用 argsort 跳过某些元素 - NumPy / Python

Question

我想应用排序操作，一行一行，只保持值高于给定阈值。

为此，我看到我可以使用掩码数组来应用阈值。但是，argsort 一直在考虑屏蔽值（低于阈值）并将其替换为 fill_value。

但是，如果该值已被替换为 NaN，我根本不想要任何结果。

a = np.array([[0.522235,0.128270,0.708973],
              [0.994557,0.844426,0.366608],
              [0.986669,0.143659,0.395891],
              [0.291339,0.421843,0.278869],
              [0.250303,0.861475,0.904534],
              [0.973436,0.360466,0.751913]])

threshold = 0.5
m_a = np.ma.masked_less_equal(a, threshold)
argsorted = m_a.argsort(-1)

这给了我：

array([[0, 2, 1],
       [1, 0, 2],
       [0, 1, 2],
       [0, 1, 2],
       [1, 2, 0],
       [2, 0, 1]])

但我想得到：

array([[0,   NaN,   1],
       [1,     0, NaN],
       [0,   NaN, NaN],
       [NaN, NaN, NaN],
       [NaN,   0,   1],
       [  1, NaN,   0]])

知道如何得到这个结果吗？

感谢您的帮助！最佳，

Answer 1

我们可以再添加一个 argsort 以便更轻松地获得所需的输出 -

sidx = argsorted.argsort(1)
mask = sidx >= (a.shape[1]-m_a.mask.sum(1,keepdims=True))
out = np.where(mask,np.nan,sidx)

我们也可以从头开始避免masked-arrays-

def thresholded_argsort(a, threshold):
    m = a<threshold
    ac = a.copy()
    ac[m] = ac.max()+1
    sidx = ac.argsort(1).argsort(1)
    mask = sidx>=(ac.shape[1]-m.sum(1,keepdims=True))
    return np.where(mask,np.nan,sidx)

样本运行-

In [46]: a
Out[46]: 
array([[0.522235, 0.12827 , 0.708973],
       [0.994557, 0.844426, 0.366608],
       [0.986669, 0.143659, 0.395891],
       [0.291339, 0.421843, 0.278869],
       [0.250303, 0.861475, 0.904534],
       [0.973436, 0.360466, 0.751913]])

In [47]: thresholded_argsort(a, threshold=0.5)
Out[47]: 
array([[ 0., nan,  1.],
       [ 1.,  0., nan],
       [ 0., nan, nan],
       [nan, nan, nan],
       [nan,  0.,  1.],
       [ 1., nan,  0.]])

注意：我们可以使用避免使用 array-assignment 的额外 argsort 来提高性能。因此，对于沿第二轴的 2D 数组，它将是 -

def argsort_unique2D(idx):
    m,n = idx.shape
    idx_out = np.empty((m,n),dtype=int)
    np.put_along_axis(idx_out, idx, np.arange(n), axis=1)
    return idx_out

因此，argsorted.argsort(1) 可以用 argsort_unique2D(argsorted) 代替，而 ac.argsort(1).argsort(1) 可以用 argsort_unique2D(ac.argsort(1)) 在较早发布的解决方案中代替。

Answer 2

如果我理解正确的话，您不想考虑将 NaN 用于排序。在那种情况下，我不确定您预期结果背后的逻辑。您可以尝试以下代码。我相信这就是您要找的：-

import numpy as np
a = np.array([[0.522235,0.128270,0.708973],
              [0.994557,0.844426,0.366608],
              [0.986669,0.143659,0.395891],
              [0.291339,0.421843,0.278869],
              [0.250303,0.861475,0.904534],
              [0.973436,0.360466,0.751913]])

threshold = 0.5
m_a = np.ma.masked_less_equal(a, threshold).filled(np.nan)
result = np.where(
        np.isnan(m_a),
        np.nan, m_a.argsort(-1)
    )
result

它应该给你以下结果：-

array([[ 0., nan,  1.],
       [ 1.,  0., nan],
       [ 0., nan, nan],
       [nan, nan, nan],
       [nan,  2.,  0.],
       [ 2., nan,  1.]])

希望对您有所帮助！！

Answer 3

a = np.array([[0.522235,0.128270,0.708973],
              [0.994557,0.844426,0.366608],
              [0.986669,0.143659,0.395891],
              [0.291339,0.421843,0.278869],
              [0.250303,0.861475,0.904534],
              [0.973436,0.360466,0.751913]])

threshold = .5


def tri(ligne):
    s = sorted(ligne, key=lambda x: x < threshold and float('inf') or x)
    nv_liste = [s.index(v) for v in ligne]
    for i in range(len(ligne)):
        if ligne[i] < threshold:
            nv_liste[i] = np.nan
    return nv_liste

np.apply_along_axis(tri, 1, a)

给你：

array([[ 0., nan,  1.],
       [ 1.,  0., nan],
       [ 0., nan, nan],
       [nan, nan, nan],
       [nan,  0.,  1.],
       [ 1., nan,  0.]])

根据阈值在数组中每行应用 argsort 跳过某些元素 - NumPy / Python

Apply argsort per row in array skipping certain elements based on threshold - NumPy / Python

python

numpy

np.argsort