根据阈值在数组中每行应用 argsort 跳过某些元素 - NumPy / Python
Apply argsort per row in array skipping certain elements based on threshold - NumPy / Python
我想应用排序操作,一行一行,只保持值高于给定阈值。
为此,我看到我可以使用掩码数组来应用阈值。
但是,argsort
一直在考虑屏蔽值(低于阈值)并将其替换为 fill_value
。
但是,如果该值已被替换为 NaN,我根本不想要任何结果。
a = np.array([[0.522235,0.128270,0.708973],
[0.994557,0.844426,0.366608],
[0.986669,0.143659,0.395891],
[0.291339,0.421843,0.278869],
[0.250303,0.861475,0.904534],
[0.973436,0.360466,0.751913]])
threshold = 0.5
m_a = np.ma.masked_less_equal(a, threshold)
argsorted = m_a.argsort(-1)
这给了我:
array([[0, 2, 1],
[1, 0, 2],
[0, 1, 2],
[0, 1, 2],
[1, 2, 0],
[2, 0, 1]])
但我想得到:
array([[0, NaN, 1],
[1, 0, NaN],
[0, NaN, NaN],
[NaN, NaN, NaN],
[NaN, 0, 1],
[ 1, NaN, 0]])
知道如何得到这个结果吗?
感谢您的帮助!
最佳,
我们可以再添加一个 argsort
以便更轻松地获得所需的输出 -
sidx = argsorted.argsort(1)
mask = sidx >= (a.shape[1]-m_a.mask.sum(1,keepdims=True))
out = np.where(mask,np.nan,sidx)
我们也可以从头开始避免masked-arrays
-
def thresholded_argsort(a, threshold):
m = a<threshold
ac = a.copy()
ac[m] = ac.max()+1
sidx = ac.argsort(1).argsort(1)
mask = sidx>=(ac.shape[1]-m.sum(1,keepdims=True))
return np.where(mask,np.nan,sidx)
样本运行-
In [46]: a
Out[46]:
array([[0.522235, 0.12827 , 0.708973],
[0.994557, 0.844426, 0.366608],
[0.986669, 0.143659, 0.395891],
[0.291339, 0.421843, 0.278869],
[0.250303, 0.861475, 0.904534],
[0.973436, 0.360466, 0.751913]])
In [47]: thresholded_argsort(a, threshold=0.5)
Out[47]:
array([[ 0., nan, 1.],
[ 1., 0., nan],
[ 0., nan, nan],
[nan, nan, nan],
[nan, 0., 1.],
[ 1., nan, 0.]])
注意:我们可以使用 避免使用 array-assignment
的额外 argsort 来提高性能。因此,对于沿第二轴的 2D
数组,它将是 -
def argsort_unique2D(idx):
m,n = idx.shape
idx_out = np.empty((m,n),dtype=int)
np.put_along_axis(idx_out, idx, np.arange(n), axis=1)
return idx_out
因此,argsorted.argsort(1)
可以用 argsort_unique2D(argsorted)
代替,而 ac.argsort(1).argsort(1)
可以用 argsort_unique2D(ac.argsort(1))
在较早发布的解决方案中代替。
如果我理解正确的话,您不想考虑将 NaN 用于排序。在那种情况下,我不确定您预期结果背后的逻辑。您可以尝试以下代码。我相信这就是您要找的:-
import numpy as np
a = np.array([[0.522235,0.128270,0.708973],
[0.994557,0.844426,0.366608],
[0.986669,0.143659,0.395891],
[0.291339,0.421843,0.278869],
[0.250303,0.861475,0.904534],
[0.973436,0.360466,0.751913]])
threshold = 0.5
m_a = np.ma.masked_less_equal(a, threshold).filled(np.nan)
result = np.where(
np.isnan(m_a),
np.nan, m_a.argsort(-1)
)
result
它应该给你以下结果:-
array([[ 0., nan, 1.],
[ 1., 0., nan],
[ 0., nan, nan],
[nan, nan, nan],
[nan, 2., 0.],
[ 2., nan, 1.]])
希望对您有所帮助!!
a = np.array([[0.522235,0.128270,0.708973],
[0.994557,0.844426,0.366608],
[0.986669,0.143659,0.395891],
[0.291339,0.421843,0.278869],
[0.250303,0.861475,0.904534],
[0.973436,0.360466,0.751913]])
threshold = .5
def tri(ligne):
s = sorted(ligne, key=lambda x: x < threshold and float('inf') or x)
nv_liste = [s.index(v) for v in ligne]
for i in range(len(ligne)):
if ligne[i] < threshold:
nv_liste[i] = np.nan
return nv_liste
np.apply_along_axis(tri, 1, a)
给你:
array([[ 0., nan, 1.],
[ 1., 0., nan],
[ 0., nan, nan],
[nan, nan, nan],
[nan, 0., 1.],
[ 1., nan, 0.]])
我想应用排序操作,一行一行,只保持值高于给定阈值。
为此,我看到我可以使用掩码数组来应用阈值。
但是,argsort
一直在考虑屏蔽值(低于阈值)并将其替换为 fill_value
。
但是,如果该值已被替换为 NaN,我根本不想要任何结果。
a = np.array([[0.522235,0.128270,0.708973],
[0.994557,0.844426,0.366608],
[0.986669,0.143659,0.395891],
[0.291339,0.421843,0.278869],
[0.250303,0.861475,0.904534],
[0.973436,0.360466,0.751913]])
threshold = 0.5
m_a = np.ma.masked_less_equal(a, threshold)
argsorted = m_a.argsort(-1)
这给了我:
array([[0, 2, 1],
[1, 0, 2],
[0, 1, 2],
[0, 1, 2],
[1, 2, 0],
[2, 0, 1]])
但我想得到:
array([[0, NaN, 1],
[1, 0, NaN],
[0, NaN, NaN],
[NaN, NaN, NaN],
[NaN, 0, 1],
[ 1, NaN, 0]])
知道如何得到这个结果吗?
感谢您的帮助! 最佳,
我们可以再添加一个 argsort
以便更轻松地获得所需的输出 -
sidx = argsorted.argsort(1)
mask = sidx >= (a.shape[1]-m_a.mask.sum(1,keepdims=True))
out = np.where(mask,np.nan,sidx)
我们也可以从头开始避免masked-arrays
-
def thresholded_argsort(a, threshold):
m = a<threshold
ac = a.copy()
ac[m] = ac.max()+1
sidx = ac.argsort(1).argsort(1)
mask = sidx>=(ac.shape[1]-m.sum(1,keepdims=True))
return np.where(mask,np.nan,sidx)
样本运行-
In [46]: a
Out[46]:
array([[0.522235, 0.12827 , 0.708973],
[0.994557, 0.844426, 0.366608],
[0.986669, 0.143659, 0.395891],
[0.291339, 0.421843, 0.278869],
[0.250303, 0.861475, 0.904534],
[0.973436, 0.360466, 0.751913]])
In [47]: thresholded_argsort(a, threshold=0.5)
Out[47]:
array([[ 0., nan, 1.],
[ 1., 0., nan],
[ 0., nan, nan],
[nan, nan, nan],
[nan, 0., 1.],
[ 1., nan, 0.]])
注意:我们可以使用 array-assignment
的额外 argsort 来提高性能。因此,对于沿第二轴的 2D
数组,它将是 -
def argsort_unique2D(idx):
m,n = idx.shape
idx_out = np.empty((m,n),dtype=int)
np.put_along_axis(idx_out, idx, np.arange(n), axis=1)
return idx_out
因此,argsorted.argsort(1)
可以用 argsort_unique2D(argsorted)
代替,而 ac.argsort(1).argsort(1)
可以用 argsort_unique2D(ac.argsort(1))
在较早发布的解决方案中代替。
如果我理解正确的话,您不想考虑将 NaN 用于排序。在那种情况下,我不确定您预期结果背后的逻辑。您可以尝试以下代码。我相信这就是您要找的:-
import numpy as np
a = np.array([[0.522235,0.128270,0.708973],
[0.994557,0.844426,0.366608],
[0.986669,0.143659,0.395891],
[0.291339,0.421843,0.278869],
[0.250303,0.861475,0.904534],
[0.973436,0.360466,0.751913]])
threshold = 0.5
m_a = np.ma.masked_less_equal(a, threshold).filled(np.nan)
result = np.where(
np.isnan(m_a),
np.nan, m_a.argsort(-1)
)
result
它应该给你以下结果:-
array([[ 0., nan, 1.],
[ 1., 0., nan],
[ 0., nan, nan],
[nan, nan, nan],
[nan, 2., 0.],
[ 2., nan, 1.]])
希望对您有所帮助!!
a = np.array([[0.522235,0.128270,0.708973],
[0.994557,0.844426,0.366608],
[0.986669,0.143659,0.395891],
[0.291339,0.421843,0.278869],
[0.250303,0.861475,0.904534],
[0.973436,0.360466,0.751913]])
threshold = .5
def tri(ligne):
s = sorted(ligne, key=lambda x: x < threshold and float('inf') or x)
nv_liste = [s.index(v) for v in ligne]
for i in range(len(ligne)):
if ligne[i] < threshold:
nv_liste[i] = np.nan
return nv_liste
np.apply_along_axis(tri, 1, a)
给你:
array([[ 0., nan, 1.],
[ 1., 0., nan],
[ 0., nan, nan],
[nan, nan, nan],
[nan, 0., 1.],
[ 1., nan, 0.]])