xarray:dask 数组的滚动平均值与滚动操作中数据和坐标的大小冲突
xarray: rolling mean of dask array conflicting sizes for data and coordinate in rolling operation
我正在尝试对 xarray 中的 dask 数组进行滚动平均。我的问题可能在于滚动平均值之前的重新分块。我收到数据和坐标之间大小冲突的 ValueError。但是,这是在滚动操作中出现的,因为我认为在进入滚动操作之前数组的数据和坐标没有冲突。
抱歉没有创建数据进行测试,但我的项目数据很快就可以使用:
import xarray as xr
remote_data = xr.open_dataarray('http://iridl.ldeo.columbia.edu/SOURCES/.Models'\
'/.SubX/.RSMAS/.CCSM4/.hindcast/.zg/dods',
chunks={'L': 1, 'S': 1})
da = remote_data.isel(P=0,L=0,M=0,X=0,Y=0)
da_day_clim = da.groupby('S.dayofyear').mean('S')
print(da_day_clim)
#<xarray.DataArray 'zg' (dayofyear: 366)>
#dask.array<shape=(366,), dtype=float32, chunksize=(1,)>
#Coordinates:
# L timedelta64[ns] 12:00:00
# Y float32 -90.0
# M float32 1.0
# X float32 0.0
# P int32 500
# * dayofyear (dayofyear) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ...
# Do a 31-day rolling mean
# da_day_clim.rolling(dayofyear=31, center=True).mean()
# This brings up:
#ValueError: The overlapping depth 30 is larger than your
#smallest chunk size 1. Rechunk your array
#with a larger chunk size or a chunk size that
#more evenly divides the shape of your array.
# Read http://xarray.pydata.org/en/stable/dask.html
# and found http://xarray.pydata.org/en/stable/generated/xarray.Dataset.chunk.html#xarray.Dataset.chunk
# I could make a little PR to add the .chunk() into the ValeError message. Thoughts?
# Rechunk. Played around with a few values but decided on
# the len of dayofyear
da_day_clim2 = da_day_clim.chunk({'dayofyear': 366})
print(da_day_clim2)
#<xarray.DataArray 'zg' (dayofyear: 366)>
#dask.array<shape=(366,), dtype=float32, chunksize=(366,)>
#Coordinates:
# L timedelta64[ns] 12:00:00
# Y float32 -90.0
# M float32 1.0
# X float32 0.0
# P int32 500
# * dayofyear (dayofyear) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ...
# Rolling mean on this
da_day_clim_smooth = da_day_clim2.rolling(dayofyear=31, center=True).mean()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-57-6acf382cdd3d> in <module>()
4 da_day_clim = da.groupby('S.dayofyear').mean('S')
5 da_day_clim2 = da_day_clim.chunk({'dayofyear': 366})
----> 6 da_day_clim_smooth = da_day_clim2.rolling(dayofyear=31, center=True).mean()
~/anaconda/envs/SubXNAO/lib/python3.6/site-packages/xarray/core/rolling.py in wrapped_func(self, **kwargs)
307 if self.center:
308 values = values[valid]
--> 309 result = DataArray(values, self.obj.coords)
310
311 return result
~/anaconda/envs/SubXNAO/lib/python3.6/site-packages/xarray/core/dataarray.py in __init__(self, data, coords, dims, name, attrs, encoding, fastpath)
224
225 data = as_compatible_data(data)
--> 226 coords, dims = _infer_coords_and_dims(data.shape, coords, dims)
227 variable = Variable(dims, data, attrs, encoding, fastpath=True)
228
~/anaconda/envs/SubXNAO/lib/python3.6/site-packages/xarray/core/dataarray.py in _infer_coords_and_dims(shape, coords, dims)
79 raise ValueError('conflicting sizes for dimension %r: '
80 'length %s on the data but length %s on '
---> 81 'coordinate %r' % (d, sizes[d], s, k))
82
83 if k in sizes and v.shape != (sizes[k],):
ValueError: conflicting sizes for dimension 'dayofyear': length 351 on the data but length 366 on coordinate 'dayofyear'
长度351与366-351=15(window的一半)有关。
这原来是 Xarray 中的一个错误,已在 https://github.com/pydata/xarray/pull/2122
中修复
修复将在即将发布的 Xarray 0.10.4 中进行。
我正在尝试对 xarray 中的 dask 数组进行滚动平均。我的问题可能在于滚动平均值之前的重新分块。我收到数据和坐标之间大小冲突的 ValueError。但是,这是在滚动操作中出现的,因为我认为在进入滚动操作之前数组的数据和坐标没有冲突。
抱歉没有创建数据进行测试,但我的项目数据很快就可以使用:
import xarray as xr
remote_data = xr.open_dataarray('http://iridl.ldeo.columbia.edu/SOURCES/.Models'\
'/.SubX/.RSMAS/.CCSM4/.hindcast/.zg/dods',
chunks={'L': 1, 'S': 1})
da = remote_data.isel(P=0,L=0,M=0,X=0,Y=0)
da_day_clim = da.groupby('S.dayofyear').mean('S')
print(da_day_clim)
#<xarray.DataArray 'zg' (dayofyear: 366)>
#dask.array<shape=(366,), dtype=float32, chunksize=(1,)>
#Coordinates:
# L timedelta64[ns] 12:00:00
# Y float32 -90.0
# M float32 1.0
# X float32 0.0
# P int32 500
# * dayofyear (dayofyear) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ...
# Do a 31-day rolling mean
# da_day_clim.rolling(dayofyear=31, center=True).mean()
# This brings up:
#ValueError: The overlapping depth 30 is larger than your
#smallest chunk size 1. Rechunk your array
#with a larger chunk size or a chunk size that
#more evenly divides the shape of your array.
# Read http://xarray.pydata.org/en/stable/dask.html
# and found http://xarray.pydata.org/en/stable/generated/xarray.Dataset.chunk.html#xarray.Dataset.chunk
# I could make a little PR to add the .chunk() into the ValeError message. Thoughts?
# Rechunk. Played around with a few values but decided on
# the len of dayofyear
da_day_clim2 = da_day_clim.chunk({'dayofyear': 366})
print(da_day_clim2)
#<xarray.DataArray 'zg' (dayofyear: 366)>
#dask.array<shape=(366,), dtype=float32, chunksize=(366,)>
#Coordinates:
# L timedelta64[ns] 12:00:00
# Y float32 -90.0
# M float32 1.0
# X float32 0.0
# P int32 500
# * dayofyear (dayofyear) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ...
# Rolling mean on this
da_day_clim_smooth = da_day_clim2.rolling(dayofyear=31, center=True).mean()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-57-6acf382cdd3d> in <module>()
4 da_day_clim = da.groupby('S.dayofyear').mean('S')
5 da_day_clim2 = da_day_clim.chunk({'dayofyear': 366})
----> 6 da_day_clim_smooth = da_day_clim2.rolling(dayofyear=31, center=True).mean()
~/anaconda/envs/SubXNAO/lib/python3.6/site-packages/xarray/core/rolling.py in wrapped_func(self, **kwargs)
307 if self.center:
308 values = values[valid]
--> 309 result = DataArray(values, self.obj.coords)
310
311 return result
~/anaconda/envs/SubXNAO/lib/python3.6/site-packages/xarray/core/dataarray.py in __init__(self, data, coords, dims, name, attrs, encoding, fastpath)
224
225 data = as_compatible_data(data)
--> 226 coords, dims = _infer_coords_and_dims(data.shape, coords, dims)
227 variable = Variable(dims, data, attrs, encoding, fastpath=True)
228
~/anaconda/envs/SubXNAO/lib/python3.6/site-packages/xarray/core/dataarray.py in _infer_coords_and_dims(shape, coords, dims)
79 raise ValueError('conflicting sizes for dimension %r: '
80 'length %s on the data but length %s on '
---> 81 'coordinate %r' % (d, sizes[d], s, k))
82
83 if k in sizes and v.shape != (sizes[k],):
ValueError: conflicting sizes for dimension 'dayofyear': length 351 on the data but length 366 on coordinate 'dayofyear'
长度351与366-351=15(window的一半)有关。
这原来是 Xarray 中的一个错误,已在 https://github.com/pydata/xarray/pull/2122
中修复修复将在即将发布的 Xarray 0.10.4 中进行。