有没有办法在 2D NumPy 数组中找到最大列值的唯一行索引？

Question

对于二维 NumPy 数组中的每一列，该列的最大值可以出现多次。我想找到每列最大值的行索引，不重复行索引。

下面是一个示例，说明为什么 np.argmax 不起作用：

import numpy as np

a = np.array([[1, 1, 0],
              [1, 0, 1],
              [0, 0, 1]])

ind = np.argmax(a, axis=0)

print(ind)

输出：

[0 0 2]

我想要结果：[1, 0, 2] 对于这个例子。

即：

第二列的行索引必须为 0
这意味着第一列的行索引必须为 1
这反过来意味着第三列的行索引必须是 2

一个稍微复杂的例子是这个数组：

a = np.array([[1, 1, 0],
              [1, 1, 1],
              [0, 0, 1]])

在这种情况下，没有具有唯一最大值的列。我会对以下任一答案感到满意：

[0, 1, 2]
[1, 0, 2]

一个更复杂的例子是：

a = np.array([[1, 1, 1],
              [1, 1, 1],
              [0, 1, 1]])

在这种情况下，我对以下任何一个答案都很满意：

[0, 1, 2]
[0, 2, 1]
[1, 0, 2]
[1, 2, 0]

我可以用循环和逻辑条件解决这些问题，但我想知道是否有办法使用 numpy 函数解决问题？

Answer 1

受建议的解决方案启发 :

import numpy_indexed as npi
ind = np.argwhere(a == a.max(0))
l = np.array(npi.group_by(ind[:,1]).split(ind[:, 0]))
def pick_one(a, index, buffer, visited):
    if index == len(a):
        return True
    for item in a[index]:
        if item not in visited:
            buffer.append(item)
            visited.add(item)
            if pick_one(a, index + 1, buffer, visited):
                return True
            buffer.pop()
            visited.remove(item)
    return False


buffer = []
pick_one(l, 0, buffer, set())
print(buffer)

示例：

a = np.array([[1, 1, 0],
              [1, 0, 1],
              [0, 0, 1]])

输出：

[1, 0, 2]

Answer 2

可能有点矫枉过正，但你可以使用 scipy.optimize.linear_sum_assignment:

from scipy.optimize import linear_sum_assignment

a = np.array([[1, 1, 0],
              [1, 0, 1],
              [0, 0, 1]])

linear_sum_assignment(-a.T)[1]
# array([1, 0, 2])

请注意，您始终可以使用以下方式减少到 0,1 的情况像

abin = a==a.max(axis=0)

这可以大大加快分配速度。

或者，参见的图论解决方案。

有没有办法在 2D NumPy 数组中找到最大列值的唯一行索引？

Is there a way to find the UNIQUE row indices of maximum columnar values in a 2D NumPy array?

arrays

numpy

multidimensional-array

python-3.x

argmax