如何使用 random.choice 在 numpy 中重现数据？

Question

我有一个带标签的数据集：

data = np.array([5.2, 4, 5, 2, 5.3, 10, 0])
labels = np.array([1, 0, 1, 2, 1, 3, 4])

我想选择标签为 1 的数据 5.2, 5 and 5.3 并重现它，如下所示：

datalabel1 = data[(labels == 1)]

然后我想做一个random.choice()，比如（伪）：

# indices are the indices from label 1
random_choices = np.random.choice(indices, size = 5)

并获得具有不同索引的不同值作为输出：

# indices are the different indices of the data from the pool out of random choice
data:    [5.3 5.2 5.2 5.2 5]
indices: [4 0 0 2 2]

我的目标是从标签为 1 的数据池中挑选出来。

Answer 1

labels == 1 是一个布尔掩码。您需要将它应用到 data，而不是返回到 labels 以获取标记为 1:

的数据元素

np.random.choice(data[labels == 1], ...)

您还可以将 labels == 1 转换为一组索引，并在 before 索引上选择：

indices = np.flatnonzero(labels == 1)
data[np.random.choice(indices, ...)]

How can I reproduce data in numpy with random.choice?