在二维布尔数组 (Python) 中查找最高真值索引的有效方法

Question

假设我有一个形状为 (nrows,ncols) 的二维布尔数组。我正在尝试有效地提取数组中每一列的最顶层 True 值的索引。如果该列的值全部为 False，则不会为该列返回任何索引。下面是一个形状为 (4,6) 的布尔数组示例，其中粗体 True 的索引将是所需的输出。

假假假假假假假假

正确错误错误正确错误错误

真假真假假真

对错对错对错

索引 (row,col) 的期望输出：[(1,0),(2,2),(1,3),(2,5)]

我尝试使用 numpy.where 以及 skyline 算法的实现，但这两个选项都很慢。有没有更有效的方法来解决这个问题？

提前感谢您的帮助。

Answer 1

我建议你试试这个：

def get_topmost(ar: np.ndarray):
    return [(row.index(True), i) for i, row in enumerate(ar.T.tolist()) if True in row]

示例：（应该按原样工作）

>>> test = np.array([
    [False, False, False, False, False, False],
    [True,  False, False, True,  False, False],
    [True,  False, True,  False, False, True],
    [True,  False, True,  True,  False, False],
])

>>> print(get_topmost(test))
[(1, 0), (2, 2), (1, 3), (2, 5)]

Answer 2

如果您愿意使用 pandas，您可以构造一个 df，只删除带有 False 的列，然后 idxmax:

arr = [[False, False, False, False, False, False],
       [True, False, False, True, False, False],
       [True, False, True, False, False, True],
       [True, False, True, True, False, False]]

df = pd.DataFrame(arr, columns=range(len(arr[0])))

s = df.loc[:, df.sum()>0].idxmax()
print (s)

结果：

0    1
2    2
3    1
5    2
dtype: int64

这是列值与行值。您可以将其转换回您想要的形式：

print (list(zip(s, s.index)))

[(1, 0), (2, 2), (1, 3), (2, 5)]

Answer 3

您可以使用 np.argmax 检测前 True 个值。

准备示例数组。

import numpy as np
a = np.array(
[[0,0,0,0,0,0],
 [1,0,0,1,0,0],
 [1,0,1,0,0,1],
 [1,0,1,1,0,0]]).astype('bool')
a

输出

array([[False, False, False, False, False, False],
       [ True, False, False,  True, False, False],
       [ True, False,  True, False, False,  True],
       [ True, False,  True,  True, False, False]])

堆叠一行 False 以处理没有 True 的列。在具有 np.argmax 的每一列中找到第一个 True 并为行索引附加一个 arange。您必须通过 -1 调整列索引，因为我们向数组添加了一行。然后 select True 的索引大于 0

的列

b = np.vstack([np.zeros_like(a[0]),a])
t = b.argmax(axis=0)
np.vstack([t - 1, np.arange(len(a[0]))]).T[t > 0]

输出

array([[1, 0],
       [2, 2],
       [1, 3],
       [2, 5]])

翻译对 numpy 的回答给出了一行解决方案

np.vstack([a.argmax(axis=0), np.arange(len(a[0]))]).T[a.sum(0) > 0]

输出

array([[1, 0],
       [2, 2],
       [1, 3],
       [2, 5]])

在二维布尔数组 (Python) 中查找最高真值索引的有效方法

Efficient way to find indices of topmost True values in 2d boolean array (Python)

python

boolean