numpy 布尔数组中每列至少有一个真值

Question

假设我有一个非常大的二维布尔数组（为了举例，我们取 4 行 x 3 列的维度）：

toto = np.array([[True, True, False],
                [False, True, False],
                [True, False, False],
                [False, True, False]])

我想转换toto使其每列至少包含一个真值，同时保持其他列不变。

编辑：规则就是这样：如果一列都是假的，我想在随机行中引入一个真。

所以在这个例子中，第 3 列中的一个 False 应该变成 True。

你会如何有效地做到这一点？

提前致谢

Answer 1

你可以这样做：

col_mask = ~np.any(toto, axis=0)
row_idx = np.random.randint(toto.shape[0], size=np.sum(col_mask))
toto[row_idx, col_mask]=True

col_mask 是 array([False, False, True]) 个可变列。 row_idx 是由可更改的行索引组成的数组。

Answer 2

import numpy as np

toto = np.array([[False, True, False], [False, True, False],
                 [False, False, False], [False, True, False]])

# First we get a boolean array indicating columns that have at least one True value
mask = np.any(toto, axis=0)

# Now we invert the mask to get columns indexes (as boolean array) with no True value
mask = np.logical_not(mask)

# Notice that if we index with this mask on the colum dimension we get elements
# in all rows only in the columns containing no True value. The dimension is is
# "num_rows x num_columns_without_true"
toto[:, mask]

# Now we need random indexes for rows in the columns containing only false. That
# means an array of integers from zero to `num_rows - 1` with
# `num_columns_without_true` elements
row_indexes = np.random.randint(toto.shape[0], size=np.sum(mask))

# Now we can use both masks to select one False element in each column containing only False elements and set them to True
toto[row_indexes, mask] = True

免责声明：mathfux 速度更快，解决方案与我正在编写的解决方案基本相同（如果这是您正在寻找的，请接受他的回答），但由于我写了更多评论，所以我决定 post 无论如何。

numpy 布尔数组中每列至少有一个真值

At least one True value per column in numpy boolean array

python

numpy

boolean-operations