将矩阵中位置低于 0 的所有元素转换为 0 (Python)

Question

这是一个矩阵：

matrix = [[1, 1, 1, 0], 
          [0, 5, 0, 1], 
          [2, 1, 3, 10]]

我想将0以下的所有元素按位置更改为0（在同一列）。

结果矩阵将是：

matrix = [[1, 1, 1, 0], 
          [0, 5, 0, 0], 
          [0, 1, 0, 0]]

到目前为止我试过了。 return 为空

import numpy as np
def transform(matrix):
    newmatrix = np.asarray(matrix)
    i = 0
    j = 0
    for j in range(0,len(matrix[0])-1):
        while i < int(len(matrix))-1 and j < int(len(matrix[0]))-1:
            if newmatrix[i][j] == 0:
                np.put(newmatrix,newmatrix[i+1][j], 0 )
        i +=1
    return print (newmatrix)

Answer 1

这是一个简单的（虽然没有优化）算法：

import numpy as np
from numba import jit

m = np.array([[1, 1, 1, 0], 
              [0, 5, 0, 1], 
              [2, 1, 3, 10]])

@jit(nopython=True)
def zeroer(m):
    a, b = m.shape
    for j in range(b):
        for i in range(a):
            if m[i, j] == 0:
                m[i:, j] = 0
                break
    return m

zeroer(m)

# [[1 1 1 0]
#  [0 5 0 0]
#  [0 1 0 0]]

Answer 2

方法一（原创）

import numpy as np
def transform(matrix):
    mat = np.asarray(matrix)
    mat[np.logical_not(np.not_equal(mat, 0).cumprod(axis=0))] = 0
    # Alternatively:
    # mat[~(mat != 0).cumprod(axis=0, dtype=np.bool)] = 0
    # or,
    # mat[~((mat != 0).cumprod(axis=0, dtype=np.bool))] = 0
    return mat

然后根据您的样本数据，我得到以下 mat:

In [195]: matrix = [[1, 1, 1, 0], 
     ...:           [0, 5, 0, 1], 
     ...:           [2, 1, 3, 10]]

In [196]: transform(matrix)
Out[196]: 
array([[1, 1, 1, 0],
       [0, 5, 0, 0],
       [0, 1, 0, 0]])

方法二（进一步优化）

def transform2(matrix):
    mat = np.asarray(matrix)
    mat *= (mat != 0).cumprod(axis=0, dtype=np.bool)
    return mat

方法三（更优化）

def transform3(matrix):
    mat = np.asarray(matrix)
    mat *= mat.cumprod(axis=0, dtype=np.bool)
    return mat

说明

让我们看一下主要语句（方法一）：

mat[np.logical_not(np.not_equal(mat, 0).cumprod(axis=0))] = 0

我们可以把它分成几个"elementary"操作：

创建一个包含 False（数值为 0）的布尔掩码，其中 mat 的元素为 0 和 True（数值为1) 它们是非零的：
```
mask1 = np.not_equal(mat, 0)
```
利用数值上False为0的事实，使用cumprod() function (a good explanation can be found here: https://www.mathworks.com/help/matlab/ref/cumprod.html)
```
mask2 = mask1.cumprod(axis=0)
```
因为 1*1==1 和 0*0 或 0*1 是 0，所以此 "mask" 的所有元素将是 0 或 1。它们将 0 仅在 mask1 为零的位置 及以下 (!) 因为产品 "cumulative nature" 沿着列（因此axis=0）。
现在，我们要把mat中对应mask2中的0的那些元素设置为0。为此，我们创建了一个布尔掩码 True，其中 mask2 是 0，其他地方是 False。这可以通过将逻辑（或二进制）NOT 应用于 mask2:
来轻松实现
```
mask3 = np.logical_not(mask2)
```
在这里使用 "logical" NOT 创建一个布尔数组，因此我们避免了显式类型转换。
最后我们用Boolean Indexing将mat中需要设置为0的那些元素设置为[=23] =]:
```
mat[mask3] = 0
```

可选优化

如果你想到了，我们可以通过执行以下操作来摆脱步骤 3 和 4：

mask2 = mask1.cumprod(axis=0, dtype=np.bool) #convert result to boolean type 
mat *= mask2 # combined step 3&4

请参阅上面的 "Method 2" 部分以了解完整的实施。

表现

还有几个其他答案使用了 numpy.ufunc.accumulate()。从根本上说，所有这些方法都围绕 0 是一个 "special" 值的想法，在 0*anything==0 的意义上，或者在@DSM 的回答中， False=0<True=0 并且让numpy 对数组执行 "cumulative" 操作。

除了我的方法 #1 比其他方法慢之外，性能存在一些差异，但大多数差异很小。

下面是更多功能的一些时序测试。注意：为了正确执行测试，我们需要使用大数组。小型阵列测试将测量开销、兑现等。

In [1]: import sys
    ...: import numpy as np
    ...: 

In [2]: print(sys.version)
    ...: 
3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 13:14:59) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]

In [3]: print(np.__version__)
    ...: 
1.12.1

In [4]: # Method 1 (Original)
    ...: def transform1(matrix):
    ...:     mat = np.asarray(matrix)
    ...:     mat[np.logical_not(np.not_equal(mat, 0).cumprod(axis=0))] = 0
    ...:     return mat
    ...: 

In [5]: # Method 2:
    ...: def transform2(matrix):
    ...:     mat = np.asarray(matrix)
    ...:     mat *= (mat != 0).cumprod(axis=0, dtype=np.bool)
    ...:     return mat
    ...: 

In [6]: # @DSM method:
    ...: def transform_DSM(matrix):
    ...:     mat = np.asarray(matrix)
    ...:     mat *= np.minimum.accumulate(mat != 0)
    ...:     return mat
    ...: 

In [7]: # @DanielF method:
    ...: def transform_DanielF(matrix):
    ...:     mat = np.asarray(matrix)
    ...:     mat[~np.logical_and.accumulate(mat, axis = 0)] = 0
    ...:     return mat
    ...: 

In [8]: # Optimized @DanielF method:
    ...: def transform_DanielF_optimized(matrix):
    ...:     mat = np.asarray(matrix)
    ...:     mat *= np.logical_and.accumulate(mat, dtype=np.bool)
    ...:     return mat
    ...: 

In [9]: matrix = np.random.randint(0, 20000, (20000, 20000))

In [10]: %timeit -n1 transform1(matrix)
22.1 s ± 241 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [11]: %timeit -n1 transform2(matrix)
9.29 s ± 185 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [12]: %timeit -n1 transform3(matrix)
9.23 s ± 180 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [13]: %timeit -n1 transform_DSM(matrix)
9.24 s ± 195 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [14]: %timeit -n1 transform_DanielF(matrix)
10.3 s ± 219 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [15]: %timeit -n1 transform_DanielF_optimized(matrix)
9.27 s ± 187 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

我的初始解决方案（方法 1）最慢，而其他方法要快得多。由于使用布尔索引，@DanielF 原始方法有点慢（但优化变体与其他优化方法一样快）。

Answer 3

cumprod 方法的一个变体是使用累积最小值（或最大值）。我更喜欢这个，因为你可以用它来避免任何无法比较的算术运算，如果你愿意的话，尽管它很难被解决：

In [37]:  m
Out[37]: 
array([[ 1,  1,  1,  0],
       [ 0,  5,  0,  1],
       [ 2,  1,  3, 10]])

In [38]: m * np.minimum.accumulate(m != 0)
Out[38]: 
array([[1, 1, 1, 0],
       [0, 5, 0, 0],
       [0, 1, 0, 0]])

In [39]: np.where(np.minimum.accumulate(m != 0), m, 0)
Out[39]: 
array([[1, 1, 1, 0],
       [0, 5, 0, 0],
       [0, 1, 0, 0]])

Answer 4

@AGNGazer 解决方案的更优化版本，使用 np.logical_and.accumulate 和整数的隐式布尔转换（不需要大量乘法）

def transform(matrix):
    mat = np.asarray(matrix)
    mat[~np.logical_and.accumulate(mat, axis = 0)] = 0
    return mat

transform(m)
Out:
array([[1, 1, 1, 0],
       [0, 5, 0, 0],
       [0, 1, 0, 0]])

时间安排：

%timeit transform2(m) # AGN's solution
The slowest run took 44.73 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 9.93 µs per loop

%timeit transform(m)
The slowest run took 9.00 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 7.99 µs per loop

m = np.random.randint(0,5,(100,100))

%timeit transform(m)
The slowest run took 6.03 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 43.9 µs per loop

%timeit transform2(m)
The slowest run took 4.09 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 50.4 µs per loop

看起来大约有 15% 的加速。

将矩阵中位置低于 0 的所有元素转换为 0 (Python)

Transform all elements positionally below 0 into 0 in a matrix (Python)

python

numpy

linear-algebra

python-3.x

方法一（原创）

方法二（进一步优化）

方法三（更优化）

说明

可选优化

表现