如何选择 2d Numpy 数组中的行并将其存储到另一个数组中，以便一个数组保留给定列中具有最高值的行？

Question

我有一系列 Numpy 二维数组，每个数组有几列和几千行。

第一列的行具有“质量”值。

我还有一个空的 2d Numpy 数组，它有 50 行，列数与上面的数组相同。

我想遍历这些 2d 数组并 select 将具有最高质量值的值放入第二个数组中，以便在最后第二个数组具有所有行中质量最高的行初始二维数组。

例如：

arrays = [
    np.array([[10, 1, 1, 1], [1, 1, 1, 1], [3, 1, 1, 1], [2, 1, 1, 1]]),
    np.array([[1, 2, 2, 2], [1, 1, 1, 1], [1, 1, 1, 1], [200, 1, 1, 1]]),
    np.array([[1, 2, 2, 2], [40, 1, 1, 1], [30, 1, 1, 1], [2, 1, 1, 1]]),
    np.array([[300, 2, 2, 2], [1, 1, 1, 1], [3, 1, 1, 1], [2, 1, 1, 1]]),
]

best_arrays = np.zeros((2, 4), dtype=np.int)

for i in range(len(arrays)):
   arr = arrays[i]

   # do something so that the rows with highest quality are selected into the `best_arrays` array

   print(best_arrays)

因此循环将打印：

>> [[10, 2, 2, 2], [3, 1, 1, 1]]  # best rows of first array
>> [[10, 2, 2, 2], [200, 1, 1, 1]]  # best rows between first and second arrays
>> [[40, 2, 2, 2], [200, 1, 1, 1]]
>> [[300, 2, 2, 2], [200, 1, 1, 1]]  # best_arrays has the rows with highest "quality" of all.

我如何在 Numpy 中执行此操作？这些数组有很多行，所以我不能只在纯 python 中迭代 - 我正在寻找一个 Numpy 函数，以便它在 C 中运行。

谢谢！

Answer 1

既然你说行数以千为单位（这还不错），我认为这种方法将相当有效和灵活，可以从每个二维数组中选择所需的最佳行数。

best = 2
quality_col = 0
for array in arrays:
# sort the rows based on the "quality" column
     print(array[np.flip((array[:, quality_col].argsort()))][:best, :])

输出

[[10  1  1  1]
 [ 3  1  1  1]]
[[200   1   1   1]
 [  1   1   1   1]]
[[40  1  1  1]
 [30  1  1  1]]
[[300   2   2   2]
 [  3   1   1   1]]

Answer 2

这是一个工作示例，源自@sai 的回答，最后一步连接数组并选择最佳数组：

import numpy as np

arrays = [
    np.array([[10, 1, 1, 1], [1, 1, 1, 1], [3, 1, 1, 1], [2, 1, 1, 1]]),
    np.array([[1, 2, 2, 2], [1, 1, 1, 1], [1, 1, 1, 1], [200, 1, 1, 1]]),
    np.array([[1, 2, 2, 2], [40, 1, 1, 1], [30, 1, 1, 1], [2, 1, 1, 1]]),
    np.array([[300, 2, 2, 2], [1, 1, 1, 1], [3, 1, 1, 1], [2, 1, 1, 1]]),
]

best_arrays = np.zeros((0, 4), dtype=np.int64)

best = 2
quality_col = 0
for i in range(len(arrays)):
   best_arrays = np.vstack((best_arrays, arrays[i]))

   # select the best rows
   best_arrays = best_arrays[np.flip((best_arrays[:, quality_col].argsort()))][:best, :]

print(best_arrays)

并选出最好的：

[[300   2   2   2]
 [200   1   1   1]]

如何选择 2d Numpy 数组中的行并将其存储到另一个数组中，以便一个数组保留给定列中具有最高值的行？

How to pick rows in a 2d Numpy array and store into another so that one array keeps the rows with the highest values in a given column?

python

arrays

numpy

numpy-ndarray

输出