核心 4D 图像 tif 存储为 hdf5 python

Question

我有 27GB 的 2D tiff 文件，它们表示 3D 图像电影的切片。我希望能够将这些数据切片，就好像它是一个简单的 numpy4d 数组一样。看起来 dask.array 是一个很好的工具，一旦它作为 hdf5 文件存储在内存中就可以干净地操作数组。

如果这些文件不能全部放入内存，我如何首先将它们存储为 hdf5 文件。我是 h5.py 和一般数据库的新手。

谢谢。

Answer 1

编辑：使用 `dask.array` 的 `imread` 函数

从 dask 0.7.0 开始，您不需要将图像存储在 HDF5 中。直接使用imread函数代替：

In [1]: from skimage.io import imread

In [2]: im = imread('foo.1.tiff')

In [3]: im.shape
Out[3]: (5, 5, 3)

In [4]: ls foo.*.tiff
foo.1.tiff  foo.2.tiff  foo.3.tiff  foo.4.tiff

In [5]: from dask.array.image import imread

In [6]: im = imread('foo.*.tiff')

In [7]: im.shape
Out[7]: (4, 5, 5, 3)

将图像存储到 HDF5 中的旧答案

数据摄取通常是最棘手的问题。 Dask.array 没有与图像文件的任何自动集成（尽管如果有足够的兴趣，这是非常可行的。）幸运的是将数据移动到 h5py 很容易，因为 h5py 支持 numpy 切片语法。在下面的示例中，我们将创建一个空的 h5py 数据集，然后在 for 循环中将四个小的 tiff 文件存储到该数据集中。

首先我们得到图像的文件名（请原谅玩具数据集。我没有任何真实的东西。）

In [1]: from glob import glob
In [2]: filenames = sorted(glob('foo.*.tiff'))
In [3]: filenames
Out[3]: ['foo.1.tiff', 'foo.2.tiff', 'foo.3.tiff', 'foo.4.tiff']

加载并检查样本图像

In [4]: from skimage.io import imread
In [5]: im = imread(filenames[0])  # a sample image
In [6]: im.shape  # tiny image
Out[6]: (5, 5, 3)
In [7]: im.dtype
Out[7]: dtype('int8')

现在我们将在该文件中创建一个 HDF5 文件和一个名为 '/x' 的 HDF5 数据集。

In [8]: import h5py
In [9]: f = h5py.File('myfile.hdf5')  # make an hdf5 file
In [10]: out = f.require_dataset('/x', shape=(len(filenames), 5, 5, 3), dtype=im.dtype)

太好了，现在我们可以一次将一张图像插入到 HDF5 数据集中。

In [11]: for i, fn in enumerate(filenames):
   ....:     im = imread(fn)
   ....:     out[i, :, :, :] = im

此时dask.array可以愉快地包裹out

In [12]: import dask.array as da
In [13]: x = da.from_array(out, chunks=(1, 5, 5, 3))  # treat each image as a single chunk
In [14]: x[::2, :, :, 0].mean()
Out[14]: dask.array<x_3, shape=(), chunks=(), dtype=float64>

如果您希望看到更多对图像堆栈的本机支持，那么我鼓励您 raise an issue。无需通过 HDF5 即可直接从 tiff 文件堆栈中使用 dask.array。

核心 4D 图像 tif 存储为 hdf5 python

out of core 4D image tif storage as hdf5 python

python

h5py

dask

编辑：使用 dask.array 的 imread 函数

将图像存储到 HDF5 中的旧答案

编辑：使用 `dask.array` 的 `imread` 函数