从 numpy ndarray 中提取非 None 数组的有效方法

efficient way to extract non None arrays from numpy ndarray

如果这个问题看起来很长而且很基础,我提前表示诚挚的歉意。

给定:

import numpy as np
import time

c, q = int(3e5), int(5e5)    
a = np.full( (c,q,3), None )

# fillout with some non None arrays: 3D (x,y,z) positions
a[0,0, :] = np.array([-4,0.1,0])
a[0,1, :] = np.array([9.2,3.1,0])
a[0,5, :] = np.array([3,-4.3,0])
a[0,6, :] = np.array([-1,12.8,0])

a[2,1, :] = np.array([4.5,-9,0])
a[2,3, :] = np.array([-0.1,6.1,0])
a[2,8, :] = np.array([-7,1,0])

a[3,0, :] = np.array([-1,0.7,0])
a[3,6, :] = np.array([-15,26,0])

a[5,0, :] = np.array([0.1,-1.1,0])

a[7,5, :] = np.array([0,0,0])

a[8,2, :] = np.array([5,6,0])

a[9,10, :] = np.array([-1.1,1,0])

a[10,3, :] = np.array([-32,15,0])

a[11,7, :] = np.array([0,9.3,0])

a[12,2, :] = np.array([0.9,6.2,0])

a[14,9, :] = np.array([8.6,5.6,0])

a[15,5, :] = np.array([0.5,8.5,0])

目标:

我想从 a 中提取非 None 元素。目前,我的以下代码非常耗时且效率很低,因为我使用的是基本 for loop:

bt = time.time()
for ci in range(c):
    if any(ci == value for value in [2, 5]):
        print(f">> Generating {ci}+ ranks ...")
        poseNplus = []
        aNplus = a[ci:]
        for ci_i in range(aNplus.shape[0]):
            aNplus_Q = aNplus[ci_i]
            for qi in range(aNplus_Q.shape[0]):
                if all(aNplus_Q[qi] != None):
                    poseNplus.append( aNplus_Q[qi] )
        print(len(poseNplus), poseNplus)
et = time.time()
print(f"Took {(et-bt):.3f} s")

这很花时间:

Took 580.888 s

按照@Marc Felix 的回答,我可以提取 ALLNone 三元组如下:首先更改 a = np.full( (c,q,3), np.nan ),然后:

bt = time.time()
nan_values = np.any(np.isnan(a), axis=-1)
result = a[nan_values==False].reshape((-1, 3))
et = time.time()
print(f"Took {(et-bt):.3f} s")
print(result.shape)
print(result)

哪个returns:

Took 0.318 s
(18, 3)
[[ -4.    0.1   0. ]
 [  9.2   3.1   0. ]
 [  3.   -4.3   0. ]
 [ -1.   12.8   0. ]
 [  4.5  -9.    0. ] <<<--- rank2 - END: from here till end
 [ -0.1   6.1   0. ]
 [ -7.    1.    0. ]
 [ -1.    0.7   0. ]
 [-15.   26.    0. ]
 [  0.1  -1.1   0. ] <<<--- rank5 - END: from here till end
 [  0.    0.    0. ]
 [  5.    6.    0. ]
 [ -1.1   1.    0. ]
 [-32.   15.    0. ]
 [  0.    9.3   0. ]
 [  0.9   6.2   0. ]
 [  8.6   5.6   0. ]
 [  0.5   8.5   0. ]]

但我想要的结果应该是这样的:

>> Generating 2+ ranks ...
[[  4.5  -9.    0. ]
 [ -0.1   6.1   0. ]
 [ -7.    1.    0. ]
 [ -1.    0.7   0. ]
 [-15.   26.    0. ]
 [  0.1  -1.1   0. ]
 [  0.    0.    0. ]
 [  5.    6.    0. ]
 [ -1.1   1.    0. ]
 [-32.   15.    0. ]
 [  0.    9.3   0. ]
 [  0.9   6.2   0. ]
 [  8.6   5.6   0. ]
 [  0.5   8.5   0. ]]
------------------------------------------------------------
>> Generating 5+ ranks ...
[[  0.1  -1.1   0. ]
 [  0.    0.    0. ]
 [  5.    6.    0. ]
 [ -1.1   1.    0. ]
 [-32.   15.    0. ]
 [  0.    9.3   0. ]
 [  0.9   6.2   0. ]
 [  8.6   5.6   0. ]
 [  0.5   8.5   0. ]]
------------------------------------------------------------

问题:

还有其他省时的方法吗?

我知道 this post 但结果是:

b = a[a != None]
print(b)

[-4.0 0.1 0.0 9.2 3.1 0.0 3.0 -4.3 0.0 -1.0 12.8 0.0 4.5 -9.0 0.0 -0.1 6.1
 0.0 -7 1 0 -1.0 0.7 0.0 -15 26 0 0.1 -1.1 0.0 0 0 0 5 6 0 -1.1 1.0 0.0
 -32 15 0 0.0 9.3 0.0 0.9 6.2 0.0 8.6 5.6 0.0 0.5 8.5 0.0]

您可以使用 np.isnan() 检测 nan 值。这看起来如下:

nan_values = np.any(np.isnan(a), axis=-1)

那么下面应该会给你正确的结果:

result = a[nan_values==False].reshape((-1, 3))

修改@Marc Felix 的回答和a 修改使用np.full 作为提问者的更新:

nan_values = np.any(np.isnan(a[2:]), axis=-1)
result = a[2:][nan_values==False].reshape((-1, 3))
print(f">> Generating {2}+ ranks ...\n", result, '\n ------------------------------------------------------------')

nan_values = np.any(np.isnan(a[5:]), axis=-1)
result = a[5:][nan_values==False].reshape((-1, 3))
print(f">> Generating {5}+ ranks ...\n", result, '\n ------------------------------------------------------------')

会得到预期的结果。