带有偏移量的 numpy 结构化数组的视图
View of numpy structured array with offsets
我有以下 numpy 结构化数组:
In [250]: x
Out[250]:
array([(22, 2, -1000000000, 2000), (22, 2, 400, 2000),
(22, 2, 804846, 2000), (44, 2, 800, 4000), (55, 5, 900, 5000),
(55, 5, 1000, 5000), (55, 5, 8900, 5000), (55, 5, 11400, 5000),
(33, 3, 14500, 3000), (33, 3, 40550, 3000), (33, 3, 40990, 3000),
(33, 3, 44400, 3000)],
dtype=[('f1', '<i4'), ('f2', '<i4'), ('f3', '<i4'), ('f4', '<i4')])
下面的数组是上面数组的一个子集(也是一个视图):
In [251]: fields=['f1','f3']
In [252]: y=x.getfield(np.dtype(
...: {name: x.dtype.fields[name] for name in fields}
...: ))
In [253]: y
Out[253]:
array([(22, -1000000000), (22, 400), (22, 804846), (44, 800), (55, 900),
(55, 1000), (55, 8900), (55, 11400), (33, 14500), (33, 40550),
(33, 40990), (33, 44400)],
dtype={'names':['f1','f3'], 'formats':['<i4','<i4'], 'offsets':[0,8], 'itemsize':12})
我正在尝试将 y 转换为常规的 numpy 数组。我希望数组成为一个视图。问题是以下给了我一个错误:
In [254]: y.view(('<i4',2))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-254-88440f106a89> in <module>()
----> 1 y.view(('<i4',2))
C:\numpy\core\_internal.pyc in _view_is_safe(oldtype, newtype)
499
500 # raises if there is a problem
--> 501 _check_field_overlap(new_fieldtile, old_fieldtile)
502
503 # Given a string containing a PEP 3118 format specifier,
C:\numpy\core\_internal.pyc in _check_field_overlap(new_fields, old_fields)
402 old_bytes.update(set(range(off, off+tp.itemsize)))
403 if new_bytes.difference(old_bytes):
--> 404 raise TypeError("view would access data parent array doesn't own")
405
406 #next check that we do not interpret non-Objects as Objects, and vv
TypeError: view would access data parent array doesn't own
但是,如果我选择连续的字段,它会起作用:
In [255]: fields=['f1','f2']
...:
...: y=x.getfield(np.dtype(
...: {name: x.dtype.fields[name] for name in fields}
...: ))
...:
In [256]: y
Out[256]:
array([(22, 2), (22, 2), (22, 2), (44, 2), (55, 5), (55, 5), (55, 5),
(55, 5), (33, 3), (33, 3), (33, 3), (33, 3)],
dtype=[('f1', '<i4'), ('f2', '<i4')])
In [257]: y.view(('<i4',2))
Out[257]:
array([[22, 2],
[22, 2],
[22, 2],
[44, 2],
[55, 5],
[55, 5],
[55, 5],
[55, 5],
[33, 3],
[33, 3],
[33, 3],
[33, 3]])
当字段不连续时,视图转换似乎不起作用,有替代方法吗?
以下有点令人困惑 - 但要点是,要使这种 view
正常工作,它必须能够访问具有常规数组步幅和形状的字段。从 ['f1','f3'] 获取视图失败的原因与 np.ones((12,4))[:,[0,2]]
生成副本的原因基本相同。
========
在您的结构化数组中,每条记录存储为 4*'i4' 字节。该布局与 (n,4) 'i4' 数组兼容:
In [381]: x.__array_interface__['data'] # databuffer pointer
Out[381]: (160925352, False)
In [382]: x.view(('i',4)).__array_interface__['data']
Out[382]: (160925352, False) # same buffer
In [387]: x.view(('i',4)).shape
Out[387]: (12, 4)
但是当我取这个数组的各个切片时
In [383]: x.view(('i',4))[:,[0,1]].__array_interface__['data']
Out[383]: (169894184, False) # advance indexing - a copy
In [384]: x.view(('i',4))[:,:2].__array_interface__['data']
Out[384]: (160925352, False) # same buffer
但是选择['f1','f3']等同于:x.view(('i',4))[:,[0,2]]
,另一个副本。
或者看看步幅。与第 2 个字段
In [404]: y2=x.getfield(np.dtype({name: x.dtype.fields[name] for name in ['f1','f2']}))
In [405]: y2.dtype
Out[405]: dtype([('f1', '<i4'), ('f2', '<i4')])
In [406]: y2.strides
Out[406]: (16,)
In [407]: y2.view(('i',2)).strides
Out[407]: (16, 4)
要将此数组视为整数,它可以将行步进 16,将列步进 4,并且只需要 2 列。
或查看 4 列和 2 列案例的完整词典
In [409]: x.view(('i',4)).__array_interface__
Out[409]:
{'data': (160925352, False),
'descr': [('', '<i4')],
'shape': (12, 4),
'strides': None,
'typestr': '<i4',
'version': 3}
In [410]: y2.view(('i',2)).__array_interface__
Out[410]:
{'data': (160925352, False),
'descr': [('', '<i4')],
'shape': (12, 2),
'strides': (16, 4),
'typestr': '<i4',
'version': 3}
步幅和 dtype 相同,只是形状不同。 y2
案例之所以有效,是因为它可以跨步访问所需的字节,并忽略 2 列。
如果我切掉 4 列情况的中间 2 列,我得到一个视图 - 相同的数据缓冲区,但有一个偏移量:
In [385]: x.view(('i',4))[:,2:4].__array_interface__['data']
Out[385]: (160925360, False)
但是对这两个字段使用 getfield
会产生与 ['f1','f3']:
相同的错误
In [388]: y2=x.getfield(np.dtype({name: x.dtype.fields[name] for name in ['f2','f3']})).view(('i',2))
...
ValueError: new type not compatible with array.
view
无法实现切片可以实现的数据缓冲区偏移量。
========
再看中间的2个字段:
In [412]: y2=x.getfield(np.dtype({name: x.dtype.fields[name] for name in ['f2','f3']}))
...:
In [413]: y2
Out[413]:
array([(2, -1000000000), (2, 400), (2, 804846), (2, 800), (5, 900),
(5, 1000), (5, 8900), (5, 11400), (3, 14500), (3, 40550),
(3, 40990), (3, 44400)],
dtype={'names':['f2','f3'], 'formats':['<i4','<i4'], 'offsets':[4,8], 'itemsize':12})
In [414]: y2.__array_interface__['data']
Out[414]: (160925352, False)
y2
指向原始数据库开始。它以 dtype
偏移量实现偏移量。将其与 In[385]
.
中的偏移量进行比较
是的,直接使用ndarray
构造函数:
x = np.array([(22, 2, -1000000000, 2000),
(22, 2, 400, 2000),
(22, 2, 804846, 2000),
(44, 2, 800, 4000),
(55, 5, 900, 5000),
(55, 5, 1000, 5000)],
dtype=[('f1','i'),('f2','i'),('f3','i'),('f4','i')])
fields = ['f4', 'f1']
shape = x.shape + (len(fields),)
offsets = [x.dtype.fields[name][1] for name in fields]
assert not any(np.diff(offsets, n=2))
strides = x.strides + (offsets[1] - offsets[0],)
y = np.ndarray(shape=shape, dtype='i', buffer=x,
offset=offsets[0], strides=strides)
print repr(y)
给出:
array([[2000, 22],
[2000, 22],
[2000, 22],
[4000, 44],
[5000, 55],
[5000, 55]])
顺便说一句,当原始数组中的所有字段都具有相同的 dtype 时,首先在该数组上创建一个视图然后进行切片操作会容易得多。对于与上面相同的结果:
y = x.view('i').reshape(x.shape + (-1,))[:,-1::-3]
我有以下 numpy 结构化数组:
In [250]: x
Out[250]:
array([(22, 2, -1000000000, 2000), (22, 2, 400, 2000),
(22, 2, 804846, 2000), (44, 2, 800, 4000), (55, 5, 900, 5000),
(55, 5, 1000, 5000), (55, 5, 8900, 5000), (55, 5, 11400, 5000),
(33, 3, 14500, 3000), (33, 3, 40550, 3000), (33, 3, 40990, 3000),
(33, 3, 44400, 3000)],
dtype=[('f1', '<i4'), ('f2', '<i4'), ('f3', '<i4'), ('f4', '<i4')])
下面的数组是上面数组的一个子集(也是一个视图):
In [251]: fields=['f1','f3']
In [252]: y=x.getfield(np.dtype(
...: {name: x.dtype.fields[name] for name in fields}
...: ))
In [253]: y
Out[253]:
array([(22, -1000000000), (22, 400), (22, 804846), (44, 800), (55, 900),
(55, 1000), (55, 8900), (55, 11400), (33, 14500), (33, 40550),
(33, 40990), (33, 44400)],
dtype={'names':['f1','f3'], 'formats':['<i4','<i4'], 'offsets':[0,8], 'itemsize':12})
我正在尝试将 y 转换为常规的 numpy 数组。我希望数组成为一个视图。问题是以下给了我一个错误:
In [254]: y.view(('<i4',2))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-254-88440f106a89> in <module>()
----> 1 y.view(('<i4',2))
C:\numpy\core\_internal.pyc in _view_is_safe(oldtype, newtype)
499
500 # raises if there is a problem
--> 501 _check_field_overlap(new_fieldtile, old_fieldtile)
502
503 # Given a string containing a PEP 3118 format specifier,
C:\numpy\core\_internal.pyc in _check_field_overlap(new_fields, old_fields)
402 old_bytes.update(set(range(off, off+tp.itemsize)))
403 if new_bytes.difference(old_bytes):
--> 404 raise TypeError("view would access data parent array doesn't own")
405
406 #next check that we do not interpret non-Objects as Objects, and vv
TypeError: view would access data parent array doesn't own
但是,如果我选择连续的字段,它会起作用:
In [255]: fields=['f1','f2']
...:
...: y=x.getfield(np.dtype(
...: {name: x.dtype.fields[name] for name in fields}
...: ))
...:
In [256]: y
Out[256]:
array([(22, 2), (22, 2), (22, 2), (44, 2), (55, 5), (55, 5), (55, 5),
(55, 5), (33, 3), (33, 3), (33, 3), (33, 3)],
dtype=[('f1', '<i4'), ('f2', '<i4')])
In [257]: y.view(('<i4',2))
Out[257]:
array([[22, 2],
[22, 2],
[22, 2],
[44, 2],
[55, 5],
[55, 5],
[55, 5],
[55, 5],
[33, 3],
[33, 3],
[33, 3],
[33, 3]])
当字段不连续时,视图转换似乎不起作用,有替代方法吗?
以下有点令人困惑 - 但要点是,要使这种 view
正常工作,它必须能够访问具有常规数组步幅和形状的字段。从 ['f1','f3'] 获取视图失败的原因与 np.ones((12,4))[:,[0,2]]
生成副本的原因基本相同。
========
在您的结构化数组中,每条记录存储为 4*'i4' 字节。该布局与 (n,4) 'i4' 数组兼容:
In [381]: x.__array_interface__['data'] # databuffer pointer
Out[381]: (160925352, False)
In [382]: x.view(('i',4)).__array_interface__['data']
Out[382]: (160925352, False) # same buffer
In [387]: x.view(('i',4)).shape
Out[387]: (12, 4)
但是当我取这个数组的各个切片时
In [383]: x.view(('i',4))[:,[0,1]].__array_interface__['data']
Out[383]: (169894184, False) # advance indexing - a copy
In [384]: x.view(('i',4))[:,:2].__array_interface__['data']
Out[384]: (160925352, False) # same buffer
但是选择['f1','f3']等同于:x.view(('i',4))[:,[0,2]]
,另一个副本。
或者看看步幅。与第 2 个字段
In [404]: y2=x.getfield(np.dtype({name: x.dtype.fields[name] for name in ['f1','f2']}))
In [405]: y2.dtype
Out[405]: dtype([('f1', '<i4'), ('f2', '<i4')])
In [406]: y2.strides
Out[406]: (16,)
In [407]: y2.view(('i',2)).strides
Out[407]: (16, 4)
要将此数组视为整数,它可以将行步进 16,将列步进 4,并且只需要 2 列。
或查看 4 列和 2 列案例的完整词典
In [409]: x.view(('i',4)).__array_interface__
Out[409]:
{'data': (160925352, False),
'descr': [('', '<i4')],
'shape': (12, 4),
'strides': None,
'typestr': '<i4',
'version': 3}
In [410]: y2.view(('i',2)).__array_interface__
Out[410]:
{'data': (160925352, False),
'descr': [('', '<i4')],
'shape': (12, 2),
'strides': (16, 4),
'typestr': '<i4',
'version': 3}
步幅和 dtype 相同,只是形状不同。 y2
案例之所以有效,是因为它可以跨步访问所需的字节,并忽略 2 列。
如果我切掉 4 列情况的中间 2 列,我得到一个视图 - 相同的数据缓冲区,但有一个偏移量:
In [385]: x.view(('i',4))[:,2:4].__array_interface__['data']
Out[385]: (160925360, False)
但是对这两个字段使用 getfield
会产生与 ['f1','f3']:
In [388]: y2=x.getfield(np.dtype({name: x.dtype.fields[name] for name in ['f2','f3']})).view(('i',2))
...
ValueError: new type not compatible with array.
view
无法实现切片可以实现的数据缓冲区偏移量。
========
再看中间的2个字段:
In [412]: y2=x.getfield(np.dtype({name: x.dtype.fields[name] for name in ['f2','f3']}))
...:
In [413]: y2
Out[413]:
array([(2, -1000000000), (2, 400), (2, 804846), (2, 800), (5, 900),
(5, 1000), (5, 8900), (5, 11400), (3, 14500), (3, 40550),
(3, 40990), (3, 44400)],
dtype={'names':['f2','f3'], 'formats':['<i4','<i4'], 'offsets':[4,8], 'itemsize':12})
In [414]: y2.__array_interface__['data']
Out[414]: (160925352, False)
y2
指向原始数据库开始。它以 dtype
偏移量实现偏移量。将其与 In[385]
.
是的,直接使用ndarray
构造函数:
x = np.array([(22, 2, -1000000000, 2000),
(22, 2, 400, 2000),
(22, 2, 804846, 2000),
(44, 2, 800, 4000),
(55, 5, 900, 5000),
(55, 5, 1000, 5000)],
dtype=[('f1','i'),('f2','i'),('f3','i'),('f4','i')])
fields = ['f4', 'f1']
shape = x.shape + (len(fields),)
offsets = [x.dtype.fields[name][1] for name in fields]
assert not any(np.diff(offsets, n=2))
strides = x.strides + (offsets[1] - offsets[0],)
y = np.ndarray(shape=shape, dtype='i', buffer=x,
offset=offsets[0], strides=strides)
print repr(y)
给出:
array([[2000, 22],
[2000, 22],
[2000, 22],
[4000, 44],
[5000, 55],
[5000, 55]])
顺便说一句,当原始数组中的所有字段都具有相同的 dtype 时,首先在该数组上创建一个视图然后进行切片操作会容易得多。对于与上面相同的结果:
y = x.view('i').reshape(x.shape + (-1,))[:,-1::-3]