应用 ufunc 后在容器内维护 numpy 子类

Question

我在 numpy's documentation 之后创建了一个从 numpy 的 ndarray 派生的 class，它看起来像（减少属性的数量以使其更具可读性）：

import numpy as np

class Atom3D( np.ndarray ):
    __array_priority__ = 11.0

    def __new__( cls, idnum, coordinates):

        # Cast numpy to be our class type
        assert len(coordinates) == 3
        obj = np.asarray(coordinates, dtype= np.float64).view(cls)
        # add the new attribute to the created instance
        obj._number = int(idnum)
        # Finally, we must return the newly created object:
        return obj

    def __array_finalize__( self, obj ):
        self._number = getattr(obj, '_number', None)

    def __array_wrap__( self, out_arr, context=None ):
        return np.ndarray.__array_wrap__(self, out_arr, context)

    def __repr__( self ):
        return "{0._number}: ({0[0]:8.3f}, {0[1]:8.3f}, {0[2]:9.3f})".format(self)

当我执行将 numpy 的 ufunc 应用于对象的测试时：

a1 = Atom3D(1, [5., 5., 5.])
print type(a1), repr(a1)
m  = np.identity(3)
a2 = np.dot(a1, m)
print type(a2), repr(a2)

我得到了预期的结果；也就是说，点函数保留了对象的subclassing:

<class '__main__.Atom3D'> 1: (   5.000,    5.000,     5.000)  
<class '__main__.Atom3D'> 1: (   5.000,    5.000,     5.000)

但是，当我尝试将相同的 np.dot 应用于这些对象的数组时，subclass 丢失了。因此，执行：

print "regular"
atom_list1 = [a1, a2, a3]
atom_list2 = np.dot(atom_list1, m)
for _ in atom_list2:
    print type(_), repr(_)

print "numpy array"
atom_list1 = np.array([a1, a2, a3], dtype=np.object)
atom_list2 = np.dot(atom_list1, m)
for _ in atom_list2:
    print type(_), repr(_)

给我这个：

regular
<type 'numpy.ndarray'> array([ 5.,  5.,  5.])
<type 'numpy.ndarray'> array([ 6.,  4.,  2.])
<type 'numpy.ndarray'> array([ 8.,  6.,  8.])
numpy array
<type 'numpy.ndarray'> array([5.0, 5.0, 5.0], dtype=object)
<type 'numpy.ndarray'> array([6.0, 4.0, 2.0], dtype=object)
<type 'numpy.ndarray'> array([8.0, 6.0, 8.0], dtype=object)

其他操作也是如此，例如__sub__:

print "regular"
a1 = Atom3D(1, [5., 5., 5.])
a2 = a1 - np.array([3., 2., 0.])
print type(a2), repr(a2)
print "numpy array"
a1 = Atom3D(1, [5., 5., 5.])
a2 = Atom3D(2, [6., 4., 2.])
a3 = Atom3D(3, [8., 6., 8.])
atom_list1 = np.array([a1, a2, a3], dtype=np.object)
atom_list2 = atom_list1 - np.array([3., 2., 0.])
for _ in atom_list2:
    print type(_), repr(_)

将产生：

regular
<class '__main__.Atom3D'> 1: (   2.000,    3.000,     5.000)
numpy array
<type 'numpy.ndarray'> array([2.0, 3.0, 5.0], dtype=object)
<type 'numpy.ndarray'> array([3.0, 2.0, 2.0], dtype=object)
<type 'numpy.ndarray'> array([5.0, 4.0, 8.0], dtype=object)

我一直在寻找，但没有发现我哪里出错了。
谢谢！！

J.-

Answer 1

没有dtype=Atom3D这样的东西。 dtype=list 和 dtype=np.ndarray 也是如此。它创建一个 dtype=object 数组，其中每个元素都是指向内存中其他地方的对象的指针。

用 np.array(...) 创建对象数组可能很棘手。 np.array 评估条目并做出一些自己的选择。如果你想完全控制进入对象数组的元素，最好的办法是创建一个 'blank'，然后自己分配元素。

In [508]: A=np.array([np.matrix([1,2]),np.matrix([2,1])],dtype=object)

In [509]: A       # a 3d array, no matrix subarrays
Out[509]: 
array([[[1, 2]],

       [[2, 1]]], dtype=object)

In [510]: A=np.empty((2,),dtype=object)

In [511]: A
Out[511]: array([None, None], dtype=object)

In [512]: A[:]=[np.matrix([1,2]),np.matrix([2,1])]

In [513]: A
Out[513]: array([matrix([[1, 2]]), matrix([[2, 1]])], dtype=object)

除非您真的需要对象数组来进行重塑和转置之类的操作，否则通常最好使用列表。

混合对象类型也有效：

In [522]: A=np.asarray([np.matrix([1,2]),np.ma.masked_array([2,1])],dtype=np.object)

In [523]: A
Out[523]: 
array([matrix([[1, 2]]),
       masked_array(data = [2 1],
             mask = False,
       fill_value = 999999)
], dtype=object)

============================

当您执行 np.dot([a1,a2,a3],m) 时，它首先将任何列表转换为带有 np.asarray([a1,a2,a3]) 的数组。结果是一个二维数组，而不是 Atom3d 个对象的数组。所以 dot 是通常的数组点。

如果我按照建议创建对象数组：

In [14]: A=np.empty((3,),dtype=object)
In [16]: A[:]=[a1,a2,a1+a2]

In [17]: A
Out[17]: 
array([1: (   5.000,    5.000,     5.000),
       1: (   5.000,    5.000,     5.000),
       1: (  10.000,   10.000,    10.000)], dtype=object)

In [18]: np.dot(A,m)
Out[18]: 
array([1: (   5.000,    5.000,     5.000),
       1: (   5.000,    5.000,     5.000),
       1: (  10.000,   10.000,    10.000)], dtype=object)

Atom3D类型保留；

减法相同：

In [23]: A- np.array([3.,2., 0])
Out[23]: 
array([1: (   2.000,    2.000,     2.000),
       1: (   3.000,    3.000,     3.000),
       1: (  10.000,   10.000,    10.000)], dtype=object)

添加那个数组和一个 Atom3D 是可行的，尽管显示结果有问题：

In [39]: x = A + a2

In [40]: x
Out[40]: <repr(<__main__.Atom3D at 0xb5062294>) failed: TypeError: non-empty format string passed to object.__format__>

对象 dtype 数组的计算是不确定的。一些工作，显然是通过迭代数组的元素，应用函数，并将结果转回对象数组。实际上是

的数组版本

 [func(a, x) for a in A]

即使它能工作，它也不是在执行快速编译操作；它是迭代的（时间将类似于等效列表）。

其他的不行

In [41]: a1>0
Out[41]: 1: (   1.000,    1.000,     1.000)

In [42]: A>0
...
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

我们已经多次指出，对象 dtype 数组只不过是美化的列表。与列表一样，元素是指针，因此操作将涉及迭代这些指针 - 在 Python 中，而不是 C。这不是 numpy 代码的一个高度开发的角落。

应用 ufunc 后在容器内维护 numpy 子类

Maintaining numpy subclass inside a container after applying ufunc

python

numpy

subclass