读取远程数据集 (NBM) 时出现 xarray MissingDimensionsError

xarray MissingDimensionsError when reading remote dataset (NBM)

读取NBM远程数据集时(https://vlab.ncep.noaa.gov/web/mdl/nbm) I get a xarray.core.variable.MissingDimensionsError. I'm sure i'm missing some arg settings in the open_dataset.

您可以在此处查看数据结构:https://thredds-jumbo.unidata.ucar.edu/thredds/dodsC/grib/NCEP/NBM/CONUS/TwoD.html. The full structure is shown here 使用 ncdump -h https://thredds-jumbo.unidata.ucar.edu/thredds/dodsC/grib/NCEP/NBM/CONUS/TwoD

变量使用 time1:

import xarray as xr
url = "https://thredds-jumbo.unidata.ucar.edu/thredds/dodsC/grib/NCEP/NBM/CONUS/TwoD"
ds = xr.open_dataset(url)

如果你删除这个变量,它就会进入下一个时间 dim

ds = xr.open_dataset(url, drop_variables="time1")
xarray.core.variable.MissingDimensionsError: 'time2' has more than 1-dimension and the same name as one of its dimensions ('reftime4', 'time2'). xarray disallows such variables because they conflict with the coordinates used to label dimensions.

完整追溯

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/backends/api.py", line 575, in open_dataset
    ds = maybe_decode_store(store, chunks)
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/backends/api.py", line 471, in maybe_decode_store
    ds = conventions.decode_cf(
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/conventions.py", line 600, in decode_cf
    ds = Dataset(vars, attrs=attrs)
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/core/dataset.py", line 630, in __init__
    variables, coord_names, dims, indexes, _ = merge_data_and_coords(
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/core/merge.py", line 467, in merge_data_and_coords
    return merge_core(
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/core/merge.py", line 594, in merge_core
    collected = collect_variables_and_indexes(aligned)
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/core/merge.py", line 278, in collect_variables_and_indexes
    variable = as_variable(variable, name=name)
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/core/variable.py", line 154, in as_variable
    raise MissingDimensionsError(
xarray.core.variable.MissingDimensionsError: 'time1' has more than 1-dimension and the same name as one of its dimensions ('reftime', 'time1'). xarray disallows such variables because they conflict with the coordinates used to label dimensions.

您可以在本地测试

wget https://ftp.ncep.noaa.gov/data/nccf/com/blend/prod/blend.20210214/00/core/blend.t00z.core.f001.co.grib2

如果您想从 Xarray 中的 THREDDS Forecast Model Run Collection (FRMC) 虚拟数据集访问这些“TwoD”数据集,您可以先使用 NetCDF 库对它们进行切片,然后将切片变量传递给 Xarray。如果你用 Dask 包装 NetCDF 变量,你可以保持惰性。

这是为 HRRR 的最后 60 个值提取“最佳时间序列”的示例,但使用的是 1 小时预测数据(而不是使用 FMRC Best 时默认的“分析”0 小时预测时间序列):

import netCDF4
import xarray as xr
from dask import array as da
import hvplot.xarray

url = 'https://thredds.unidata.ucar.edu/thredds/dodsC/grib/NCEP/HRRR/CONUS_2p5km/TwoD'
nc = netCDF4.Dataset(url)
arr = da.from_array(nc['Temperature_height_above_ground'])
tau = 1
da = xr.DataArray(arr[-60:,tau,0,:,:], dims=['time','y','x'], name='temp')

这是证明它有效的时间序列图: