使用 netcdf 数据和 python 在 6 小时时间步长上平均 2 个十年的数据

Question

我有 2 年的空间可变风数据，每 6 小时记录一次。我需要在每六个小时的时间间隔内对 2 个十年的数据进行平均，所以我最终得到 365 * 4 个时间步长。数据为netcdf格式。

数据如下：

import xarray as xr
filename = 'V-01011999-01012019.nc'
ds = xr.open_dataset(filename)

print(ds)
<xarray.Dataset>
Dimensions:  (lat: 8, lon: 7, time: 29221)
Coordinates:
  * lat      (lat) float32 -2.5 -5.0 -7.5 -10.0 -12.5 -15.0 -17.5 -20.0
  * lon      (lon) float32 130.0 132.5 135.0 137.5 140.0 142.5 145.0
  * time     (time) datetime64[ns] 1999-01-01 1999-01-01T06:00:00 .. 2019-01-01
Data variables:
vwnd     (time, lat, lon) float32 ...

#remove feb 29 from records
ds = ds.sel(time=~((ds.time.dt.month == 2) & (ds.time.dt.day == 29)))

我已经能够按一年中的某一天进行分组以获得一年中某一天的 2 个十年平均值。

tsavg = ds.groupby('time.dayofyear').mean('time')

print(tsavg)
<xarray.Dataset>
Dimensions:    (dayofyear: 366, lat: 8, lon: 7)
Coordinates:
  * lat        (lat) float32 -2.5 -5.0 -7.5 -10.0 -12.5 -15.0 -17.5 -20.0
  * lon        (lon) float32 130.0 132.5 135.0 137.5 140.0 142.5 145.0
  * dayofyear  (dayofyear) int64 1 2 3 4 5 6 7 8 ... 360 361 362 363 364 365 366
Data variables:
    vwnd       (dayofyear, lat, lon) float32 -2.61605 -1.49012 ... -0.959997

我真正想要的是一个长度为 365 * 4（一天 4 x 6 小时间隔）的时间坐标，每个时间步都是该时间步过去 20 年的平均值。另外，出于某种原因 tsavg.dayofyear 长度仍然是 366，即使我在 2 月 29 日删除了。我无法应用或遵循 this post 的答案。我广泛研究了 groupby 资源并尝试了很多东西，但我无法弄清楚。我正在寻找编码方面的帮助。

Answer 1

确实没有很好记录的方法来执行此操作。还要注意。

代替能够在多个级别使用 groupby（例如，请参阅 this answer 关于如何做与您在 pandas 中所要求的类似的事情），这不是在 xarray 中可用，解决此类问题的一种相当干净的方法是为分组定义一个新坐标，代表数据集中每个时间的 "time of year"。

在您的情况下，您希望按 "hour of the year" 分组（即匹配月、日和小时）。为此，您可以创建一个字符串数组，这些字符串基本上只是时间坐标中日期的字符串表示形式和删除的年份：

ds['hourofyear'] = xr.DataArray(ds.indexes['time'].strftime('%m-%d %H'), coords=ds.time.coords)
result = ds.groupby('hourofyear').mean('time')

使用 netcdf 数据和 python 在 6 小时时间步长上平均 2 个十年的数据

averaging 2 decades of data on 6 hourly timestep using netcdf data and python

python-xarray