二维数组中非唯一元素的 numpy 过滤器

Question

import numpy as np

data = np.array(
    [
        ['a' 'a'],
        ['a' 'b'],
        ['d' 'c'],
        ['a' 'b'],
        ['d' 'c'],
        ['a' 'a'],
        ['b' 'a'],
        ['c' nan]
    ]
)

如何过滤最频繁的子数组？预期结果：[['a' 'a'], ['d' 'c']]

Answer 1

我不太明白这个问题，但我认为 np.unqiue 可能会有用。

data = np.array(
     [
         ['a', 'a'],
         ['a', 'b'],
         ['d', 'c'],
         ['a', 'b'],
         ['d', 'c'],
         ['a', 'a'],
         ['b', 'a'],
         ['c', np.nan]
     ]
 )

unique, idx, counts = np.unique(data[:,0], return_counts=True, return_index=True)
threshold = 1
data[idx[counts > threshold]]

输出：

array([['a', 'a'],
       ['d', 'c']], dtype='<U32')

二维数组中非唯一元素的 numpy 过滤器

numpy filter for not unique elements in 2d array

string

2d

numpy

unique

filter