我如何从 numpy 中的 dtype 获得进步？

Question

我想我可以做到：np.zeros((), dtype=dt).strides，但是当 dtype 是大型数组类型时，这似乎效率不高，例如：('<f8', (200, 100))。有没有办法直接从 dtype 到 numpy 中的 strides？

Answer 1

我想你是在谈论一个数组：

In [257]: dt=np.dtype([('f0',float, (200,100))])
In [258]: x=np.zeros((),dtype=dt)

数组本身是 0d，只有一项。

In [259]: x.strides
Out[259]: ()

该项目的形状和步幅由 dtype 确定：

In [260]: x['f0'].strides
Out[260]: (800, 8)
In [261]: x['f0'].shape
Out[261]: (200, 100)

但是构建 x 与构建具有相同形状的普通浮点数组有什么不同吗？

In [262]: y=np.zeros((200,100),float)
In [263]: y.strides
Out[263]: (800, 8)

如果不实际构建它，就无法获得潜力 y 的进步。

Ipython whos 命令显示 x 和 y 占用大致相同 space:

x          ndarray       : 1 elems, type `[('f0', '<f8', (200, 100))]`,
   160000 bytes (156.25 kb)
y          ndarray       200x100: 20000 elems, type `float64`, 
   160000 bytes (156.25 kb)

一个有趣的问题是这样的x['f0']是否具有y的所有属性。您可能可以阅读所有属性，但您可以更改的属性可能会受到限制。

你可以解析数据类型：

In [309]: dt=np.dtype([('f0',float, (200,100))])
In [310]: dt.fields
Out[310]: mappingproxy({'f0': (dtype(('<f8', (200, 100))), 0)})
In [311]: dt[0]
Out[311]: dtype(('<f8', (200, 100)))
In [312]: dt[0].shape
Out[312]: (200, 100)
In [324]: dt[0].base
Out[324]: dtype('float64')

我没有看到 dt 或 dt[0] 的 strides 属性。可能有一些 numpy 函数根据 shape 计算 strides，但它可能是隐藏的。您可以搜索 np.lib.stride_tricks 模块。这就是 as_strided 所在的位置。

根据 (200,100) 形状，float64 占用 8 个字节，可以计算出正常（默认）步幅为 (8*100, 8).

对于没有进一步嵌套的 dtype，这似乎可行：

In [374]: dt[0]
Out[374]: dtype(('<f8', (200, 100)))
In [375]: tuple(np.array(dt[0].shape[1:]+(1,))*dt[0].base.itemsize)
Out[375]: (800, 8)

让我们用这个数据类型创建一个更复杂的数组

In [346]: x=np.zeros((3,1),dtype=dt)
In [347]: x.shape
Out[347]: (3, 1)
In [348]: x.strides
Out[348]: (160000, 160000)

它的步幅取决于形状和itemsize。但是一个字段的形状和步幅是 4d。我们可以说它们存在而无需实际访问该字段吗？

In [349]: x['f0'].strides
Out[349]: (160000, 160000, 800, 8)

一个项目的步数：

In [350]: x[0,0]['f0'].strides
Out[350]: (800, 8)

双层嵌套怎么样？

In [390]: dt1=np.dtype([('f0',np.dtype([('f00',int,(3,4))]), (20,10))])
In [391]: z=np.zeros((),dt1)
In [392]: z['f0']['f00'].shape
Out[392]: (20, 10, 3, 4)
In [393]: z['f0']['f00'].strides
Out[393]: (480, 48, 16, 4)
In [399]: (np.cumprod(np.array((10,3,4,1))[::-1])*4)[::-1]
Out[399]: array([480,  48,  16,   4], dtype=int32)

更正，字段的步幅是数组整体步幅加上字段步幅的组合。用multifield dtype可以看出

In [430]: dt=np.dtype([('f0',float, (3,4)),('f1',int),('f2',int,(2,))])
In [431]: x=np.zeros((3,2),dt)
In [432]: x.shape
Out[432]: (3, 2)
In [433]: x.strides
Out[433]: (216, 108)
In [434]: x['f0'].shape
Out[434]: (3, 2, 3, 4)
In [435]: x['f0'].strides
Out[435]: (216, 108, 32, 8)

(216,108) 跨越整个数组（项目大小为 108），与 f0 字段 (32,8)（项目大小 8）的跨越连接在一起。

Answer 2

您实际上可以在结构化数组中获取子数组的步幅，而无需创建 "full" 数组。

结构化数组中的子数组必须是连续的且按 C 顺序 according to the documentation。注意第一个例子上面的句子：

Sub-arrays always have a C-contiguous memory layout.

因此，对于没有字段的结构化数组，例如您的示例中的数组，您可以这样做（作为不可读的一行）：

import numpy as np

x = np.dtype(('<f8', (200, 100)))

strides = x.base.itemsize * np.r_[1, np.cumprod(x.shape[::-1][:-1])][::-1]

避免代码高尔夫：

shape = list(x.shape)

# First, let's make the strides for an array with an itemsize of 1 in C-order
tmp_strides = shape[::-1]
tmp_strides[1:] = list(np.cumprod(tmp_strides[:-1]))
tmp_strides[0] = 1

# Now adjust it for the real itemsize:
tmp_strides = x.base.itemsize * np.array(tmp_strides)

# And convert it to a tuple, reversing it back for proper C-order
strides = tuple(tmp_strides[::-1])

但是，当有多个字段时，这会变得更加复杂。一般来说，您需要进行适当的检查。例如：dtype 是否具有 shape 属性？它有字段吗？是否有任何字段具有 shape 属性？

我如何从 numpy 中的 dtype 获得进步？

How do I get the strides from a dtype in numpy?

python

numpy

stride