使用 xarray 计算月平均值

Using xarray to make monthly average

我正在处理 GLDAS 再分析数据,周期为 1 year。这些文件是 .nc4。我可以打开文件,但我无法为 1 month 提供 groupby。我不想做 hand 或 by for,我发现 xarraygroupby。我的脚本是:

In[16]:import xarray as xr

In[17]:gldas = xr.open_mfdataset('./GLDAS_2010/*.nc4', chunks=None, concat_dim='time', preprocess=None, engine='netcdf4', lock=None,compat='minimal',coords='minimal',data_vars='minimal')

In[18]: gldas
Out[18]: 
<xarray.Dataset>
Dimensions:                (bnds: 2, lat: 40, lon: 48, time: 365)
Coordinates:
  * lat                    (lat) float32 -34.875 -34.625 -34.375 -34.125 ...
  * lon                    (lon) float32 -59.875 -59.625 -59.375 -59.125 ...
  * time                   (time) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ...
Dimensions without coordinates: bnds
Data variables:
    Albedo_inst            (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
    AvgSurfT_inst          (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
    CanopInt_inst          (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
    ECanop_tavg            (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
    ESoil_tavg             (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
    Evap_tavg              (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
    LWdown_f_tavg          (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
    Lwnet_tavg             (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
    PotEvap_tavg           (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
    Psurf_f_inst           (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
    Qair_f_inst            (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
    Qg_tavg                (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
    Qh_tavg                (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
    Qle_tavg               (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
    Qs_acc                 (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
    Qsb_acc                (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
    Qsm_acc                (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
    Rainf_f_tavg           (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
    Rainf_tavg             (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
    RootMoist_inst         (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
    SWE_inst               (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
    SWdown_f_tavg          (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
    SnowDepth_inst         (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
    Snowf_tavg             (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
    SoilMoi0_10cm_inst     (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
    SoilMoi100_200cm_inst  (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
    SoilMoi40_100cm_inst   (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
    SoilTMP0_10cm_inst     (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
    SoilTMP100_200cm_inst  (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
    SoilTMP10_40cm_inst    (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
    SoilTMP40_100cm_inst   (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
    Swnet_tavg             (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
    Tair_f_inst            (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
    Tveg_tavg              (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
    Wind_f_inst            (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
    time_bnds              (time, bnds) float64 dask.array<shape=(365, 2), chunksize=(1, 2)>
Attributes:
    CDI:                    Climate Data Interface version 1.6.9 (http://mpim...
    Conventions:            CF-1.4
    history:                Mon Feb  5 20:07:25 2018: /usr/bin/ncks -O -L 1 /...
    source:                 Noah_v3.3
    institution:            NASA GSFC
    missing_value:          -9999.0
    tavg definision::       past 3-hour average
    acc definision::        past 3-hour accumulation
    inst definision::       instantaneous
    title:                  GLDAS2.1 LIS land surface model output
    references:             Rodell_etal_BAMS_2004, Kumar_etal_EMS_2006, Peter...
    conventions:            CF-1.6
    comment:                website: http://ldas.gsfc.nasa.gov/gldas, http://...
    MAP_PROJECTION:         EQUIDISTANT CYLINDRICAL
    SOUTH_WEST_CORNER_LAT:  -59.875
    SOUTH_WEST_CORNER_LON:  -179.875
    DX:                     0.25
    DY:                     0.25
    CDO:                    Climate Data Operators version 1.6.9 (http://mpim...
    NCO:                    20180205

当我尝试时:

In[19]:pp.set_index('time').groupby(freq='1M').mean('time')

出现此错误:

TypeError: groupby() got an unexpected keyword argument 'freq'

好的,知道有这个论点我将进行下一次尝试:

In[20]:import pandas as pd
In[21]:pp.groupby(['Albedo_inst',pd.Grouper(key='‌​time', freq='M')]).mean()

错误是:

TypeError: `group` must be an xarray.DataArray or the name of an xarray variable or dimension

上次尝试没有频率:

In[22]:pp.groupby('Albedo_inst').mean('time')

错误:

ValueError: Level values must be unique: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] on level 0

我试试:

gldas.resample('1MS', dim='time', how='mean')

错误是:

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Float64Index'

我试试:

gldas.groupby('time.month').mean('time')

错误是:

AttributeError: 'IndexVariable' object has no attribute 'month'

有人愿意帮助我吗?谢谢大家!

编辑:

In [24]: pp['time']
Out[24]: 
<xarray.DataArray 'time' (time: 365)>
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0.])
Coordinates:
  * time     (time) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ...
Attributes:
    standard_name: time
    bounds: time_bnds
    units: day as %Y%m%d.%f
    calendar: proleptic_gregorian

编辑:

如果我打开一个文件:

In[108]:gldas
Out[108]: 
<xarray.Dataset>
Dimensions:                (bnds: 2, lat: 40, lon: 48, time: 1)
Coordinates:
  * lat                    (lat) float32 -34.875 -34.625 -34.375 -34.125 ...
  * lon                    (lon) float32 -59.875 -59.625 -59.375 -59.125 ...
  * time                   (time) float64 0.0
  * bnds                   (bnds) int64 0 1
Data variables:
    Albedo_inst            (time, lat, lon) float64 15.66 16.0 16.23 15.98 ...
    AvgSurfT_inst          (time, lat, lon) float64 302.8 302.3 302.3 302.4 ...
    CanopInt_inst          (time, lat, lon) float64 0.001 0.0 0.0 0.0 0.0 ...
    ECanop_tavg            (time, lat, lon) float64 2.48 0.89 0.16 0.0 0.02 ...
    ESoil_tavg             (time, lat, lon) float64 46.39 46.58 49.18 55.22 ...
    Evap_tavg              (time, lat, lon) float64 0.0001668 0.0001683 ...
    LWdown_f_tavg          (time, lat, lon) float64 369.5 368.3 366.8 365.6 ...
    Lwnet_tavg             (time, lat, lon) float64 -88.05 -87.24 -87.68 ...
    PotEvap_tavg           (time, lat, lon) float64 622.3 610.4 601.1 599.6 ...
    Psurf_f_inst           (time, lat, lon) float64 1.004e+05 1.006e+05 ...
    Qair_f_inst            (time, lat, lon) float64 0.01117 0.01144 0.01172 ...
    Qg_tavg                (time, lat, lon) float64 54.2 53.65 55.13 58.52 ...
    Qh_tavg                (time, lat, lon) float64 110.4 101.9 104.5 114.7 ...
    Qle_tavg               (time, lat, lon) float64 417.2 420.8 416.1 405.6 ...
    Qs_acc                 (time, lat, lon) float64 0.0 0.0 0.0 0.0 0.0 0.0 ...
    Qsb_acc                (time, lat, lon) float64 0.00253 0.00335 0.00528 ...
    Qsm_acc                (time, lat, lon) float64 0.0 0.0 0.0 0.0 0.0 0.0 ...
    Rainf_f_tavg           (time, lat, lon) float64 1.4e-06 5e-07 1e-07 0.0 ...
    Rainf_tavg             (time, lat, lon) float64 1.4e-06 5e-07 1e-07 0.0 ...
    RootMoist_inst         (time, lat, lon) float64 255.3 259.6 263.9 266.1 ...
    SWE_inst               (time, lat, lon) float64 0.0 0.0 0.0 0.0 0.0 0.0 ...
    SWdown_f_tavg          (time, lat, lon) float64 793.9 789.7 791.5 794.6 ...
    SnowDepth_inst         (time, lat, lon) float64 0.0 0.0 0.0 0.0 0.0 0.0 ...
    Snowf_tavg             (time, lat, lon) float64 0.0 0.0 0.0 0.0 0.0 0.0 ...
    SoilMoi0_10cm_inst     (time, lat, lon) float64 24.86 25.17 25.47 25.51 ...
    SoilMoi100_200cm_inst  (time, lat, lon) float64 254.4 258.8 266.1 274.1 ...
    SoilMoi40_100cm_inst   (time, lat, lon) float64 153.7 156.7 159.8 161.6 ...
    SoilTMP0_10cm_inst     (time, lat, lon) float64 293.8 293.5 293.4 293.7 ...
    SoilTMP100_200cm_inst  (time, lat, lon) float64 291.3 291.0 290.9 290.9 ...
    SoilTMP10_40cm_inst    (time, lat, lon) float64 292.4 292.2 292.1 292.2 ...
    SoilTMP40_100cm_inst   (time, lat, lon) float64 292.2 292.0 291.8 291.9 ...
    Swnet_tavg             (time, lat, lon) float64 669.5 663.3 663.0 667.7 ...
    Tair_f_inst            (time, lat, lon) float64 300.2 300.0 299.8 299.5 ...
    Tveg_tavg              (time, lat, lon) float64 368.3 373.3 366.8 350.4 ...
    Wind_f_inst            (time, lat, lon) float64 3.704 3.704 3.704 3.704 ...
    time_bnds              (time, bnds) float64 0.0 0.0
Attributes:
    CDI: Climate Data Interface version 1.6.9 (http://mpimet.mpg.de/cdi)
    Conventions: CF-1.4
    history: Mon Feb  5 20:07:25 2018: /usr/bin/ncks -O -L 1 /tmpdata/regridder/services_88661/cdoGLDAS_NOAH025_3H.A20100101.1500.021.SUB.nc4 /tmpdata/regridder/services_88661/deflatecdoGLDAS_NOAH025_3H.A20100101.1500.021.SUB.nc4
Mon Feb 05 20:07:21 2018: cdo -s -L -f nc4 -sellonlatbox,-60.0,-48.0,-35.0,-25.0 -selname,Albedo_inst,AvgSurfT_inst,CanopInt_inst,ECanop_tavg,ESoil_tavg,Evap_tavg,LWdown_f_tavg,Lwnet_tavg,PotEvap_tavg,Psurf_f_inst,Qair_f_inst,Qg_tavg,Qh_tavg,Qle_tavg,Qs_acc,Qsb_acc,Qsm_acc,Rainf_...
    source: Noah_v3.3
    institution: NASA GSFC
    missing_value: -9999.0
    tavg definision:: past 3-hour average
    acc definision:: past 3-hour accumulation
    inst definision:: instantaneous
    title: GLDAS2.1 LIS land surface model output
    references: Rodell_etal_BAMS_2004, Kumar_etal_EMS_2006, Peters-Lidard_etal_ISSE_2007
    conventions: CF-1.6
    comment: website: http://ldas.gsfc.nasa.gov/gldas, http://lis.gsfc.nasa.gov/
    MAP_PROJECTION: EQUIDISTANT CYLINDRICAL
    SOUTH_WEST_CORNER_LAT: -59.875
    SOUTH_WEST_CORNER_LON: -179.875
    DX: 0.25
    DY: 0.25
    CDO: Climate Data Operators version 1.6.9 (http://mpimet.mpg.de/cdo)
    NCO: 20180205/usr/local/lib/python3.6/dist-packages/spyder/widgets/variableexplorer/utils.py:414: FutureWarning: 'summary' is deprecated and will be removed in a future version.
  display = value.summary()

结论: 我知道了。如果有人需要帮助,请给我留言。

我无法获得 xr.open_mfdataset,所以我不得不创建一个新的 ds 并继续努力。

files = glob('./GLDAS_2010/*.nc4')

data = Dataset(files[0], mode='r')
albedo=0

for file in files:
    dados = Dataset(file, mode='r')
    lons = dados.variables['lon'][:]
    lats = dados.variables['lat'][:]
    Albed = dados.variables['Albedo_inst'][0,:,:]
    time=datetime.strptime(file[-29:-16], '%Y%m%d.%H%M')
    ds = xr.DataArray(Albed, coords={'lat':lats, 'lon':lons,'time':time}, dims=['lat','lon']).to_dataset(name='Albedo')
    if type(albedo)==int:
        albedo=ds
    else:
        albedo = xr.concat([albedo, ds], 'time')

albedo = albedo.sortby('time')

mean_month = albedo.Albedo.resample(freq='M',dim='time')

如果您没有有效的时间变量(上面示例中的时间始终为 0...),您将不得不修复它。但是一旦完成,我认为您正在寻找的行是:

gldas['Albedo_inst'].resample(time="1MS").mean(dim="time")