python 中有效读取 netcdf 变量

Question

我需要能够快速读取 python 中的大量 netCDF 变量（每个文件 1 个变量）。我发现 netCDF4 库中的数据集函数与其他语言（例如 IDL）中的读取实用程序相比相当慢。

我的变量的形状为 (2600,5200)，类型为 float。它们对我来说似乎并不大（文件大小 = 52Mb）。

这是我的代码：

import numpy as np
from netCDF4 import Dataset
import time
file = '20151120-235839.netcdf'
t0=time.time()
openFile = Dataset(file,'r')
raw_data = openFile.variables['MergedReflectivityQCComposite']
data = np.copy(raw_data)
openFile.close()
print time.time-t0

读取一个变量（一个文件）大约需要3秒。我认为主要的放缓是 np.copy。 raw_data 是 <type 'netCDF4.Variable'>，因此是副本。这是在 python 中执行 netCDF 读取的 best/fastest 方式吗？

谢谢。

Answer 1

我不确定如何评价 np.copy 操作（这确实很慢），但我发现来自 UCAR 的 PyNIO 模块适用于 NetCDF 和 HDF 文件。这会将 data 放入一个 numpy 数组中：

import Nio

f = Nio.open_file(file, format="netcdf")
data = f.variables['MergedReflectivityQCComposite'][:]
f.close()

在 ndfCDF 文件上测试您的代码与 PyNIO 代码我得到的结果是 PyNIO 为 1.1 秒，而 netCDF4 模块为 3.1 秒。您的结果可能会有所不同；不过值得一看。

Answer 2

Numpy 的强大之处在于，您可以通过它保留的有关数据的元数据创建对内存中现有数据的视图。因此，通过指针，副本总是比视图慢。正如 JCOidl 所说，不清楚你为什么不使用：

 raw_data = openFile.variables['MergedReflectivityQCComposite'][:]

有关详细信息，请参阅 SciPy Cookbook and SO View onto a numpy array?

Answer 3

您可以为此使用 xarray。

%matplotlib inline 
import xarray as xr

### Single netcdf file ###
ds =  xr.open_dataset('path/file.nc')

### Opening multiple NetCDF files and concatenating them by time ####
ds = xr.open_mfdatset('path/*.nc', concat_dim='time

要读取变量，您只需键入 ds.MergedReflectivityQCComposite 或 ds.['MergedReflectivityQCComposite'][:]

您也可以使用 xr.load_dataset，但我发现它比 open 函数占用更多 space。对于 xr.open_mfdataset，如果需要，您还可以沿着文件的维度分块。这两个函数还有其他选项，您可能有兴趣在 xarray 文档中了解更多信息。

python 中有效读取 netcdf 变量

Efficient reading of netcdf variable in python

python

performance

netcdf