如何使用 xarray 按时间分组,然后 运行 分组上的 bin 函数?

How to use xarray to group by time and then run a bin function on the groups?

我有一个多维 'mean direction of total ocean swell' (mdts),netCDF 数据集。维度为 time(以小时为单位)、latitudelongitude。我只是希望按天对每小时数据进行分组,然后对于每一天,对于每个 lat/lon 网格,确定 16 个预定义的定向 bin 中的哪一个包含最多的小时数(最多可以是 24 个)。对于每个 lat/lon 网格,与具有最多小时数的 bin 关联的方向值随后将被指定为每个 lat/lon 网格的特定日期的方向。我正在将自定义函数应用于 groupby 命令,这就是发生错误的地方。我想我不明白传递给函数的是什么。

注:每个netCDF文件代表1979-2019一个月。因此,我使用 groupby 而不是 resample,因为 resample 添加了文件中不存在的其他 11 个月份。我还首先将所有时间转换为 00:00,以便 groupby 可以按天分组。

注意:我的实际代码设置为循环遍历多个 netCDF 文件。我在这里简化了一个文件。 我的简化代码:

import numpy as np
import xarray as xr
        
ifile = 'mean_direction_total_swell_Nov_1979_2019_hourly.nc'
        
# min, max, and center values of angle direction bins
min = [348.75,  11.25,  33.75,  56.25,  78.75, 101.25, 123.75, 146.25, 168.75, 191.25, 213.75, 236.25, 258.75, 281.25, 303.75, 326.25]
max = [ 11.25,  33.75,  56.25,  78.75, 101.25, 123.75, 146.25, 168.75, 191.25, 213.75, 236.25, 258.75, 281.25, 303.75, 326.25, 348.75]
dir = [   0.0,   22.5,   45.0,   67.5,   90.0,  112.5,  135.0,  157.5,  180.0,  202.5,  225.0,  247.5,  270.0,  292.5,  315.0,  337.5]
    
# custom function that I think is causing the problem    
def bins(x):
    bins_n = np.zeros([16], dtype=int)
        
    # North bin requires 'or' statement
    if(x >= min[0] or x < max[0]): bins_n[0] = bins_n[0] + 1
        
    # other bins require 'and' statement
    for i in range(1,16,1): # bins
        if(x >= min[i] and x < max[i]):
            bins_n[i] = bins_n[i] + 1
            break
    slot = np.argmax(bins_n)
        
    return dir[slot]
    
   
idatanc = xr.open_dataset(ifile)              
idata = idatanc['mdts']                          
    
idata.coords['time'] = idata.time.dt.floor('1D') # setting all hourly values to 0000 
idata_dy = idata.groupby("time").apply(bins)

返回什么。注意:此错误是基于多个 netCDF 文件的循环程序,因此它可能与上面的代码不完全对应。错误还是一样。

Traceback (most recent call last):

  File "<ipython-input-216-82adffe45690>", line 9, in <module>
    idata_dy = idata.groupby("time").apply(bins)

  File "C:\Users\TWHawk\Anaconda3\envs\tim_python36\lib\site-packages\xarray\core\groupby.py", line 815, in apply
    return self.map(func, shortcut=shortcut, args=args, **kwargs)

  File "C:\Users\TWHawk\Anaconda3\envs\tim_python36\lib\site-packages\xarray\core\groupby.py", line 800, in map
    return self._combine(applied, shortcut=shortcut)

  File "C:\Users\TWHawk\Anaconda3\envs\tim_python36\lib\site-packages\xarray\core\groupby.py", line 819, in _combine
    applied_example, applied = peek_at(applied)

  File "C:\Users\TWHawk\Anaconda3\envs\tim_python36\lib\site-packages\xarray\core\utils.py", line 183, in peek_at
    peek = next(gen)

  File "C:\Users\TWHawk\Anaconda3\envs\tim_python36\lib\site-packages\xarray\core\groupby.py", line 799, in <genexpr>
    applied = (maybe_wrap_array(arr, func(arr, *args, **kwargs)) for arr in grouped)

  File "<ipython-input-215-3d060f71ca15>", line 6, in bins
    if(x >= min[0] or x < max[0]): bins_n[0] = bins_n[0] + 1

  File "C:\Users\TWHawk\Anaconda3\envs\tim_python36\lib\site-packages\xarray\core\common.py", line 119, in __bool__
    return bool(self.values)

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

我没有一直检查结果,但我认为下面的代码可以满足您的需要:

import numpy as np
import xarray as xr
from scipy import stats

def func(x, axis):
    mode, count = np.apply_along_axis(stats.mode, axis, x)
    return mode.squeeze()

infile = 'mean_direction_total_swell_Nov_1979_2019_hourly.nc'

ds = xr.open_dataset(infile)

# make sure range is 0 <= x < 360
ds['mdts'] = np.mod(ds['mdts'], 360)

# bin the data in 16 directions (17 actually, North appears as the first and
# last bin)
step = 360 / 16
centers = np.r_[np.r_[0: 360: step], 0]
edges = np.r_[0, np.r_[step / 2: 360: step], 360]

ds['mdts_binned_idx'] = (ds['mdts'].dims, np.digitize(ds['mdts'], edges))

ds['mdts_binned'] = (ds['mdts'].dims, centers[ds['mdts_binned_idx'] - 1])

# apply stats.mode to get the modal (most common) value in each day
ds2 = xr.Dataset()
ds2['mdts_mode_1d'] = ds['mdts_binned'].resample(time='1D').reduce(func)