如何在 grib 中迭代?

How to iterate in a grib?

我正在尝试遍历 grib 文件以获取所有数据,但我不知道该怎么做。我正在 Python.

我正在使用这个代码

import xarray as xr

ds = xr.open_dataset("myFile.grib2", engine="pynio")

for v in ds:
    print(ds[v])

我得到的结果是这样的一些输出:

<xarray.DataArray 'PERPW_P0_L1_GLL0' (lat_0: 336, lon_0: 720)>
[241920 values with dtype=float32]
Coordinates:
  * lat_0    (lat_0) float32 90.0 89.5 89.0 88.5 ... -76.0 -76.5 -77.0 -77.5
  * lon_0    (lon_0) float32 0.0 0.5 1.0 1.5 2.0 ... 358.0 358.5 359.0 359.5
Attributes:
    center:                                         US National Weather Servi...
    production_status:                              Operational products
    long_name:                                      Primary wave mean period
    units:                                          s
    grid_type:                                      Latitude/longitude
    parameter_discipline_and_category:              Oceanographic products, W...
    parameter_template_discipline_category_number:  [ 0 10  0 11]
    level_type:                                     Ground or water surface
    level:                                          [1.]
    forecast_time:                                  [219]
    forecast_time_units:                            hours
    initial_time:                                   01/18/2021 (00:00)

我正在使用来自 NOAA 的 file

据我所知,这就像 table 的列。例如,如何获取所有 level 值的数据?

我通常使用 pygrib 作为 GRIB 文件,但我认为 xarray 也不错。

G = pygrib.open('file.grb'):
for g in G:
    print(g)

例如像这样获取坐标和值...

latlons = g.latlons()
values = g.values

检查所有其他可用密钥的数据.. 需要根据数据选择有效时间和预测步长偏移量。

print(g.keys())

我不确定您所说的 level 值是什么意思,但是对于您链接的文件,所有变量都是作为时间函数的二维字段。它们并不像 table 的列,而是二维网格的时间序列。

xarray 提供网格化数据的接口(如您所见in the documentation)。

我使用包 cfgrib 访问 GRIB 数据。它很可靠,由 ECMWF (https://github.com/ecmwf/cfgrib) 开发。

所以,如果你这样做:

import xarray as xr

d = xr.open_dataset('gefs.wave.t00z.c00.global.0p25.f000.grib2', engine = 'cfgrib')
print(d)

你明白了:

<xarray.Dataset>
Dimensions:     (latitude: 721, level: 3, longitude: 1440)
Coordinates:
    time        datetime64[ns] 2020-12-01
    step        timedelta64[ns] 00:00:00
    surface     int32 1
  * latitude    (latitude) float64 90.0 89.75 89.5 89.25 ... -89.5 -89.75 -90.0
  * longitude   (longitude) float64 0.0 0.25 0.5 0.75 ... 359.3 359.5 359.8
    valid_time  datetime64[ns] 2020-12-01
  * level       (level) int32 1 2 3
Data variables:
    ws          (latitude, longitude) float32 nan nan nan nan ... nan nan nan
    wdir        (latitude, longitude) float32 ...
    u           (latitude, longitude) float32 ...
    v           (latitude, longitude) float32 ...
    siconc      (latitude, longitude) float32 ...
    swh         (latitude, longitude) float32 ...
    paramId_0   (latitude, longitude) float32 ...
    mwp         (latitude, longitude) float32 ...
    perpw       (latitude, longitude) float32 ...
    mwd         (latitude, longitude) float32 ...
    dirpw       (latitude, longitude) float32 ...
    shww        (latitude, longitude) float32 ...
    swell       (level, latitude, longitude) float32 ...
    mpww        (latitude, longitude) float32 ...
    swper       (level, latitude, longitude) float32 ...
    wvdir       (latitude, longitude) float32 ...
    swdir       (level, latitude, longitude) float32 ...
Attributes:
    GRIB_edition:            2
    GRIB_centre:             kwbc
    GRIB_centreDescription:  US National Weather Service - NCEP 
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             US National Weather Service - NCEP 
    history:                 2021-02-23T16:14:32 GRIB to CDM+CF via cfgrib-0....

每个数据变量都有自己的坐标,您可以通过xarray:

访问它
print(d['swh'])
<xarray.DataArray 'swh' (latitude: 721, longitude: 1440)>
array([[nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       ...,
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan]], dtype=float32)
Coordinates:
    time        datetime64[ns] 2020-12-01
    step        timedelta64[ns] 00:00:00
    surface     int32 1
  * latitude    (latitude) float64 90.0 89.75 89.5 89.25 ... -89.5 -89.75 -90.0
  * longitude   (longitude) float64 0.0 0.25 0.5 0.75 ... 359.3 359.5 359.8
    valid_time  datetime64[ns] 2020-12-01
Attributes:
    GRIB_paramId:                             140229
    GRIB_shortName:                           swh
    GRIB_units:                               m
    GRIB_name:                                Significant height of combined ...
    GRIB_cfVarName:                           swh
    GRIB_dataType:                            an
    GRIB_missingValue:                        9999
    GRIB_numberOfPoints:                      1038240
    GRIB_typeOfLevel:                         surface
    GRIB_NV:                                  0
    GRIB_stepUnits:                           1
    GRIB_stepType:                            instant
    GRIB_gridType:                            regular_ll
    GRIB_gridDefinitionDescription:           Latitude/longitude. Also called...
    GRIB_Nx:                                  1440
    GRIB_iDirectionIncrementInDegrees:        0.25
    GRIB_iScansNegatively:                    0
    GRIB_longitudeOfFirstGridPointInDegrees:  0.0
    GRIB_longitudeOfLastGridPointInDegrees:   359.750016
    GRIB_Ny:                                  721
    GRIB_jDirectionIncrementInDegrees:        0.25
    GRIB_jPointsAreConsecutive:               0
    GRIB_jScansPositively:                    0
    GRIB_latitudeOfFirstGridPointInDegrees:   90.0
    GRIB_latitudeOfLastGridPointInDegrees:    -90.0
    long_name:                                Significant height of combined ...
    units:                                    m

变量是 DataArray,但您可以使用 values 访问底层网格化数据,因此:

d['swh'].values
array([[nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       ...,
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan]], dtype=float32)

或使用 to_* 函数导出,例如导出到 Pandas DataFrame:

print(d['swh'].to_dataframe().head())
                         time   step  surface valid_time  swh
latitude longitude                                           
90.0     0.00      2020-12-01 0 days        1 2020-12-01  NaN
         0.25      2020-12-01 0 days        1 2020-12-01  NaN
         0.50      2020-12-01 0 days        1 2020-12-01  NaN
         0.75      2020-12-01 0 days        1 2020-12-01  NaN
         1.00      2020-12-01 0 days        1 2020-12-01  NaN