具有不同 dtype 的结构化（记录）数组上的 ufunc（最小值、最大值、平均值等）

Question

我在 Python(3.8) 中使用 numpy(1.20.3) 并尝试在具有不同数据类型的结构化数组上执行简单的函数。

def test_large_record():
    x = numpy.array([0.0, 0.2, 0.3], dtype=numpy.float)
    x_2 = numpy.array([0.01, 0.12, 0.82], dtype=numpy.float)
    y = numpy.array([1, 5, 7], dtype=numpy.int)
    rec_array = numpy.rec.fromarrays([x, x_2, y], dtype=[('x', '<f8'), ('x_2', '<f8'), ('y', '<i8')])

    print(rec_array.min())

这会导致“TypeError：无法使用灵活类型执行 reduce”。

我试图创建一些东西，然后通过通用结构化数组和 return 生成具有相同数据类型的每个字段数组的视图....但这似乎不起作用。

def rec_homogeneous_generator(rec_array):
    dtype = {}

    for name, dt in rec_array.dtype.descr:
        if dt not in dtype.keys():
            dtype[dt] = []

        dtype[dt].append(name)

    for dt, cols in dtype.items():
        r = rec_array[cols]
        v = r.view(dt)
        yield v


def test_large_record():
    x = numpy.array([0.0, 0.2, 0.3], dtype=numpy.float)
    x_2 = numpy.array([0.01, 0.12, 0.82], dtype=numpy.float)
    y = numpy.array([1, 5, 7], dtype=numpy.int)
    rec_array = numpy.rec.fromarrays([x, x_2, y], dtype=[('x', '<f8'), ('x_2', '<f8'), ('y', '<i8')])

    for h_array in rec_homogeneous_generator(rec_array):
        print(h_array.min(axis=0))

结果是 0.0 和 0，这不是我所期望的。我应该得到 [0, 0.01] 和 1.

大家有什么好主意吗？

Answer 1

一次在一个字段上操作：

In [21]: [rec_array[field].min() for field in rec_array.dtype.fields]
Out[21]: [0.0, 0.01, 1]

在最近的 numpy 版本中使用多字段索引

In [23]: list(rec_homogeneous_generator(rec_array))
Out[23]: 
[rec.array([0.0e+000, 1.0e-002, 4.9e-324, 2.0e-001, 1.2e-001, 2.5e-323,
            3.0e-001, 8.2e-001, 3.5e-323],
           dtype=float64),
 rec.array([                  0, 4576918229304087675,                   1,
            4596373779694328218, 4593311331947716280,                   5,
            4599075939470750515, 4605561122934164029,                   7],
           dtype=int64)]

多字段索引：

In [25]: rec_array[['x','x_2']]
Out[25]: 
rec.array([(0. , 0.01), (0.2, 0.12), (0.3, 0.82)],
          dtype={'names':['x','x_2'], 'formats':['<f8','<f8'], 'offsets':[0,8], 'itemsize':24})

更好地处理多字段索引：

In [26]: import numpy.lib.recfunctions as rf
In [28]: rf.repack_fields(rec_array[['x','x_2']])
Out[28]: 
rec.array([(0. , 0.01), (0.2, 0.12), (0.3, 0.82)],
          dtype=[('x', '<f8'), ('x_2', '<f8')])

现在我们可以改为浮动：

In [29]: rf.repack_fields(rec_array[['x','x_2']]).view(float)
Out[29]: 
rec.array([0.  , 0.01, 0.2 , 0.12, 0.3 , 0.82],
          dtype=float64)

这个view是1d。

或更好：

In [30]: rf.structured_to_unstructured(rec_array[['x','x_2']])
Out[30]: 
rec.array([[0.  , 0.01],
           [0.2 , 0.12],
           [0.3 , 0.82]],
          dtype=float64)

这些函数记录在 structured array 页面上。

具有不同 dtype 的结构化（记录）数组上的 ufunc（最小值、最大值、平均值等）

ufunc (min, max, mean, etc) on structured (record) arrays with different dtype

python

numpy

recarray

python-3.x

numpy-ufunc