用两个 numpy 数组中的唯一值组合标记区域？

Question

我有两个带标签的 2D numpy 数组 a 和 b，它们的形状相同。我想用类似于两个数组的 GIS geometric union 的东西重新标记数组 b，这样 单元格在数组 a 和b 分配了新的唯一 ID：

我不关心输出中区域的具体编号，只要值都是唯一的即可。我在下面附上了示例数组和所需的输出：我的真实数据集要大得多，两个数组都有从“1”到“200000”的整数标签。到目前为止，我已经尝试连接数组 ID 以形成值的唯一组合，但理想情况下我想以 1、2、3 ...等形式输出一组简单的新 ID

import numpy as np
import matplotlib.pyplot as plt

# Example labelled arrays a and b
input_a = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                    [0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 0],
                    [0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 0],
                    [0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 0],
                    [0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 0],
                    [0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 0],
                    [0, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 0],
                    [0, 0, 3, 3, 3, 3, 2, 2, 2, 2, 0, 0],
                    [0, 0, 3, 3, 3, 3, 2, 2, 2, 2, 0, 0],
                    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])

input_b = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                    [0, 0, 1, 1, 1, 3, 3, 3, 3, 3, 0, 0],
                    [0, 0, 1, 1, 1, 3, 3, 3, 3, 3, 0, 0],
                    [0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 0, 0],
                    [0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 0, 0],
                    [0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 0, 0],
                    [0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 0, 0],
                    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])

# Plot inputs
plt.imshow(input_a, cmap="spectral", interpolation='nearest')
plt.imshow(input_b, cmap="spectral", interpolation='nearest')

# Desired output, union of a and b
output = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                   [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                   [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                   [0, 0, 1, 1, 1, 2, 3, 3, 3, 3, 0, 0],
                   [0, 0, 1, 1, 1, 2, 3, 3, 3, 3, 0, 0],
                   [0, 0, 1, 1, 1, 4, 7, 7, 7, 7, 0, 0],
                   [0, 0, 5, 5, 5, 6, 7, 7, 7, 7, 0, 0],
                   [0, 0, 5, 5, 5, 6, 7, 7, 7, 7, 0, 0],
                   [0, 0, 5, 5, 5, 6, 7, 7, 7, 7, 0, 0],
                   [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                   [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                   [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])

# Plot desired output
plt.imshow(output, cmap="spectral", interpolation='nearest')

Answer 1

如果我对情况的理解正确，您正在寻找来自 a 和 b 的独特配对。因此，a 中的 1 和 b 中的 1 将在输出中具有一个唯一标记； a 中的 1 和 b 中的 3 将在输出中具有另一个唯一标记。还要查看问题中所需的输出，似乎这里还有一个额外的条件情况，即如果 b 为零，则无论唯一配对如何，输出也将为零。

以下实现试图解决所有这些问题 -

c = a*(b.max()+1) + b
c[b==0] = 0
_,idx = np.unique(c,return_inverse= True)
out = idx.reshape(b.shape)

样本运行-

In [21]: a
Out[21]: 
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 0],
       [0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 0],
       [0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 0],
       [0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 0],
       [0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 0],
       [0, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 0],
       [0, 0, 3, 3, 3, 3, 2, 2, 2, 2, 0, 0],
       [0, 0, 3, 3, 3, 3, 2, 2, 2, 2, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])

In [22]: b
Out[22]: 
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 1, 1, 3, 3, 3, 3, 3, 0, 0],
       [0, 0, 1, 1, 1, 3, 3, 3, 3, 3, 0, 0],
       [0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 0, 0],
       [0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 0, 0],
       [0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 0, 0],
       [0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])

In [23]: out
Out[23]: 
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 1, 1, 3, 5, 5, 5, 5, 0, 0],
       [0, 0, 1, 1, 1, 3, 5, 5, 5, 5, 0, 0],
       [0, 0, 1, 1, 1, 2, 4, 4, 4, 4, 0, 0],
       [0, 0, 6, 6, 6, 7, 4, 4, 4, 4, 0, 0],
       [0, 0, 6, 6, 6, 7, 4, 4, 4, 4, 0, 0],
       [0, 0, 6, 6, 6, 7, 4, 4, 4, 4, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])

样例图 -

# Plot inputs
plt.figure()                                                    
plt.imshow(a, cmap="spectral", interpolation='nearest')
plt.figure() 
plt.imshow(b, cmap="spectral", interpolation='nearest')

# Plot output
plt.figure()
plt.imshow(out, cmap="spectral", interpolation='nearest')

Answer 2

这是一种从概念上讲集合并集的方法，而不是 GIS 几何并集，因为我在回答后提到了这一点。

列出所有可能的唯一二元组值，其中一个来自 a，另一个来自 b。将该列表中的每个元组映射到它在其中的索引。使用该映射创建联合数组。

例如，假设 a 和 b 是数组，每个数组包含范围 (4) 中的值，并为简单起见假设它们具有相同的形状。那么：

v = range(4)
from itertools import permutations
p = list(permutations(v,2))
m = {}
for i,x in enumerate(p):
    m[x] = i
union = np.empty_like(a)
for i,x in np.ndenumerate(a):
    union[i] = m[(x,b[i])]

为了演示，用

生成 a 和 b

np.random.randint(4, size=(3, 3))

制作：

a = array([[3, 0, 3],
           [1, 3, 2],
           [0, 0, 3]])

b = array([[1, 3, 1],
           [0, 0, 1],
           [2, 3, 0]])

m = {(0, 1): 0,
     (0, 2): 1,
     (0, 3): 2,
     (1, 0): 3,
     (1, 2): 4,
     (1, 3): 5,
     (2, 0): 6,
     (2, 1): 7,
     (2, 3): 8,
     (3, 0): 9,
     (3, 1): 10,
     (3, 2): 11}

union = array([[10,  2, 10],
               [ 3,  9,  7],
               [ 1,  2,  9]])

在这种情况下，属性并集应该大于或等于它的组合反映在增加的数值上，而不是增加元素的数量。

Answer 3

使用 itertools 排列的一个问题是排列的数量可能比需要的多得多。如果每个区域的重叠数量远小于区域数量，它会更大。

题目用的是并集但是图片显示的是交集。 Divakar 的答案复制了图中的 Intersection，并且比我下面的解决方案更优雅，后者生成 Union.

人们可以制作一个仅包含实际重叠部分的字典，然后以此为基础进行工作。首先展平输入数组让我更容易看到，我不确定这对你是否可行：

shp = numpy.shape(input_a)
a = input_a.flatten()
b = input_b.flatten()
s = set(((i,j) for i,j in zip(a,b)))           # unique pairings
d = {p:i for i,p in enumerate(sorted(list(s))} # dict{pair:index}
output_c = numpy.array([d[i,j] for i,j in zip(a,b)]).reshape(shp)

array([[ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  1,  1,  1,  1,  1,  5,  5,  5,  5,  5,  0],
       [ 0,  1,  1,  1,  1,  1,  5,  5,  5,  5,  5,  0],
       [ 0,  1,  2,  2,  2,  4,  7,  7,  7,  7,  5,  0],
       [ 0,  1,  2,  2,  2,  4,  7,  7,  7,  7,  5,  0],
       [ 0,  1,  2,  2,  2,  3,  6,  6,  6,  6,  5,  0],
       [ 0,  8,  9,  9,  9, 10,  6,  6,  6,  6,  5,  0],
       [ 0,  0,  9,  9,  9, 10,  6,  6,  6,  6,  0,  0],
       [ 0,  0,  9,  9,  9, 10,  6,  6,  6,  6,  0,  0],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0]])

用两个 numpy 数组中的唯一值组合标记区域？

Label regions with unique combinations of values in two numpy arrays?

python

arrays

numpy

python-2.6

scipy