xarray:dask 数组的滚动平均值与滚动操作中数据和坐标的大小冲突

xarray: rolling mean of dask array conflicting sizes for data and coordinate in rolling operation

我正在尝试对 xarray 中的 dask 数组进行滚动平均。我的问题可能在于滚动平均值之前的重新分块。我收到数据和坐标之间大小冲突的 ValueError。但是,这是在滚动操作中出现的,因为我认为在进入滚动操作之前数组的数据和坐标没有冲突。

抱歉没有创建数据进行测试,但我的项目数据很快就可以使用:

import xarray as xr

remote_data = xr.open_dataarray('http://iridl.ldeo.columbia.edu/SOURCES/.Models'\
                                '/.SubX/.RSMAS/.CCSM4/.hindcast/.zg/dods',
                                chunks={'L': 1, 'S': 1})
da = remote_data.isel(P=0,L=0,M=0,X=0,Y=0)
da_day_clim = da.groupby('S.dayofyear').mean('S')
print(da_day_clim)
#<xarray.DataArray 'zg' (dayofyear: 366)>
#dask.array<shape=(366,), dtype=float32, chunksize=(1,)>
#Coordinates:
#    L          timedelta64[ns] 12:00:00
#    Y          float32 -90.0
#    M          float32 1.0
#    X          float32 0.0
#    P          int32 500
#  * dayofyear  (dayofyear) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ...

# Do a 31-day rolling mean
# da_day_clim.rolling(dayofyear=31, center=True).mean()
# This brings up:
#ValueError: The overlapping depth 30 is larger than your
#smallest chunk size 1. Rechunk your array
#with a larger chunk size or a chunk size that
#more evenly divides the shape of your array.

# Read http://xarray.pydata.org/en/stable/dask.html
# and found http://xarray.pydata.org/en/stable/generated/xarray.Dataset.chunk.html#xarray.Dataset.chunk
# I could make a little PR to add the .chunk() into the ValeError message. Thoughts?

# Rechunk. Played around with a few values but decided on 
# the len of dayofyear
da_day_clim2 = da_day_clim.chunk({'dayofyear': 366})
print(da_day_clim2)
#<xarray.DataArray 'zg' (dayofyear: 366)>
#dask.array<shape=(366,), dtype=float32, chunksize=(366,)>
#Coordinates:
#    L          timedelta64[ns] 12:00:00
#    Y          float32 -90.0
#    M          float32 1.0
#    X          float32 0.0
#    P          int32 500
#  * dayofyear  (dayofyear) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ...

# Rolling mean on this
da_day_clim_smooth = da_day_clim2.rolling(dayofyear=31, center=True).mean()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-57-6acf382cdd3d> in <module>()
      4 da_day_clim = da.groupby('S.dayofyear').mean('S')
      5 da_day_clim2 = da_day_clim.chunk({'dayofyear': 366})
----> 6 da_day_clim_smooth = da_day_clim2.rolling(dayofyear=31, center=True).mean()

~/anaconda/envs/SubXNAO/lib/python3.6/site-packages/xarray/core/rolling.py in wrapped_func(self, **kwargs)
    307             if self.center:
    308                 values = values[valid]
--> 309             result = DataArray(values, self.obj.coords)
    310 
    311             return result

~/anaconda/envs/SubXNAO/lib/python3.6/site-packages/xarray/core/dataarray.py in __init__(self, data, coords, dims, name, attrs, encoding, fastpath)
    224 
    225             data = as_compatible_data(data)
--> 226             coords, dims = _infer_coords_and_dims(data.shape, coords, dims)
    227             variable = Variable(dims, data, attrs, encoding, fastpath=True)
    228 

~/anaconda/envs/SubXNAO/lib/python3.6/site-packages/xarray/core/dataarray.py in _infer_coords_and_dims(shape, coords, dims)
     79                 raise ValueError('conflicting sizes for dimension %r: '
     80                                  'length %s on the data but length %s on '
---> 81                                  'coordinate %r' % (d, sizes[d], s, k))
     82 
     83         if k in sizes and v.shape != (sizes[k],):

ValueError: conflicting sizes for dimension 'dayofyear': length 351 on the data but length 366 on coordinate 'dayofyear'

长度351与366-351=15(window的一半)有关。

这原来是 Xarray 中的一个错误,已在 https://github.com/pydata/xarray/pull/2122

中修复

修复将在即将发布的 Xarray 0.10.4 中进行。