在 python 中用 numpy 识别一列中具有相同值的向量

Identify vectors with same value in one column with numpy in python

我有一个很大的二维向量数组。我想根据其中一个向量的元素或维度将这个数组拆分成几个数组。如果此列中的值连续相同,我希望收到一个这样的小数组。例如考虑第三个维度或列:

orig = np.array([[1, 2, 3], 
                 [3, 4, 3], 
                 [5, 6, 4], 
                 [7, 8, 4], 
                 [9, 0, 4], 
                 [8, 7, 3], 
                 [6, 5, 3]])

我想变成三个数组,由第 1,2 行和第 3,4,5 行和第 6,7 行组成:

>>> a
array([[1, 2, 3],
       [3, 4, 3]])

>>> b
array([[5, 6, 4],
       [7, 8, 4],
       [9, 0, 4]])

>>> c
array([[8, 7, 3],
       [6, 5, 3]])

我是 python 和 numpy 的新手。任何帮助将不胜感激。

问候 垫子

编辑:我重新格式化数组以澄清问题

这里没什么特别的,但是这个好的老式循环应该可以解决问题

import numpy as np

a = np.array([[1, 2, 3], 
              [1, 2, 3], 
              [1, 2, 4], 
              [1, 2, 4], 
              [1, 2, 4], 
              [1, 2, 3], 
              [1, 2, 3]])
groups = []
rows = a[0]
prev = a[0][-1] # here i assume that the grouping is based on the last column, change the index accordingly if that is not the case.
for row in a[1:]:
    if row[-1] == prev:
        rows = np.vstack((rows, row))
    else:
        groups.append(rows)
        rows = [row]
    prev = row[-1]
groups.append(rows)

print groups

## [array([[1, 2, 3],
##         [1, 2, 3]]),
##  array([[1, 2, 4],
##         [1, 2, 4],
##         [1, 2, 4]]),
##  array([[1, 2, 3],
##         [1, 2, 3]])]

使用np.split

>>> a, b, c = np.split(orig, np.where(orig[:-1, 2] != orig[1:, 2])[0]+1)

>>> a
array([[1, 2, 3],
       [1, 2, 3]])
>>> b
array([[1, 2, 4],
       [1, 2, 4],
       [1, 2, 4]])
>>> c
array([[1, 2, 3],
       [1, 2, 3]])

如果 a 看起来像这样:

array([[1, 1, 2, 3],
       [2, 1, 2, 3],
       [3, 1, 2, 4],
       [4, 1, 2, 4],
       [5, 1, 2, 4],
       [6, 1, 2, 3],
       [7, 1, 2, 3]])

比这个

col = a[:, -1]
indices = np.where(col[:-1] != col[1:])[0] + 1
indices = np.concatenate(([0], indices, [len(a)]))
res = [a[start:end] for start, end in zip(indices[:-1], indices[1:])]
print(res)

结果:

[array([[1, 2, 3],
       [1, 2, 3]]), array([[1, 2, 4],
       [1, 2, 4],
       [1, 2, 4]]), array([[1, 2, 3],
       [1, 2, 3]])]

更新:np.split() 好多了。无需添加第一个和最后一个索引:

col = a[:, -1]
indices = np.where(col[:-1] != col[1:])[0] + 1
res = np.split(a, indices)