重新采样 xarray 对象以降低空间分辨率
Resample xarray object to lower resolution spatially
使用 xarray 重新采样以降低空间分辨率
我想将我的 xarray 对象重新采样到较低的空间分辨率(LESS PIXELS)。
import pandas as pd
import numpy as np
import xarray as xr
time = pd.date_range(np.datetime64('1998-01-02T00:00:00.000000000'), np.datetime64('2005-12-28T00:00:00.000000000'), freq='8D')
x = np.arange(1200)
y = np.arange(1200)
latitude = np.linspace(40,50,1200)
longitude = np.linspace(0,15.5572382,1200)
latitude, longitude = np.meshgrid(latitude, longitude)
BHR_SW = np.ones((365, 1200, 1200))
output_da = xr.DataArray(BHR_SW, coords=[time, y, x])
latitude_da = xr.DataArray(latitude, coords=[y, x])
longitude_da = xr.DataArray(longitude, coords=[y, x])
output_da = output_da.rename({'dim_0':'time','dim_1':'y','dim_2':'x'})
latitude_da = latitude_da.rename({'dim_0':'y','dim_1':'x'})
longitude_da = longitude_da.rename({'dim_0':'y','dim_1':'x'})
output_ds = output_da.to_dataset(name='BHR_SW')
output_ds = output_ds.assign({'latitude':latitude_da, 'longitude':longitude_da})
print(output_ds)
<xarray.Dataset>
Dimensions: (time: 365, x: 1200, y: 1200)
Coordinates:
* time (time) datetime64[ns] 1998-01-02 1998-01-10 ... 2005-12-23
* y (y) int64 0 1 2 3 4 5 6 7 ... 1193 1194 1195 1196 1197 1198 1199
* x (x) int64 0 1 2 3 4 5 6 7 ... 1193 1194 1195 1196 1197 1198 1199
Data variables:
BHR_SW (time, y, x) float64 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0
latitude (y, x) float64 40.0 40.01 40.02 40.03 ... 49.97 49.98 49.99 50.0
longitude (y, x) float64 0.0 0.0 0.0 0.0 0.0 ... 15.56 15.56 15.56 15.56
```
我的问题是,如何通过 x,y 坐标将以下内容重新采样为 200x200 网格?
这是在降低变量的空间分辨率。
我试过的是:
output_ds.resample(x=200).mean()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-54-10fbdf855a5d> in <module>()
----> 1 output_ds.resample(x=200).mean()
/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/xarray/core/common.pyc in resample(self, indexer, skipna, closed, label, base, keep_attrs, **indexer_kwargs)
701 group = DataArray(dim_coord, coords=dim_coord.coords,
702 dims=dim_coord.dims, name=RESAMPLE_DIM)
--> 703 grouper = pd.Grouper(freq=freq, closed=closed, label=label, base=base)
704 resampler = self._resample_cls(self, group=group, dim=dim_name,
705 grouper=grouper,
/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/pandas/core/resample.pyc in __init__(self, freq, closed, label, how, axis, fill_method, limit, loffset, kind, convention, base, **kwargs)
1198 .format(convention))
1199
-> 1200 freq = to_offset(freq)
1201
1202 end_types = set(['M', 'A', 'Q', 'BM', 'BA', 'BQ', 'W'])
/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/pandas/tseries/frequencies.pyc in to_offset(freq)
174 delta = delta + offset
175 except Exception:
--> 176 raise ValueError(libfreqs._INVALID_FREQ_ERROR.format(freq))
177
178 if delta is None:
ValueError: Invalid frequency: 200
但是我得到了显示的错误。
如何完成 x 和 y 的空间重采样?
理想情况下我想这样做:
output_ds.resample(x=200, y=200).mean()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-55-e0bfce19e037> in <module>()
----> 1 output_ds.resample(x=200, y=200).mean()
/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/xarray/core/common.pyc in resample(self, indexer, skipna, closed, label, base, keep_attrs, **indexer_kwargs)
679 if len(indexer) != 1:
680 raise ValueError(
--> 681 "Resampling only supported along single dimensions."
682 )
683 dim, freq = indexer.popitem()
ValueError: Resampling only supported along single dimensions.
注意:真实数据有不同的行为
这是我在上面创建的测试数据。关于从netcdf文件读入的真实数据
<xarray.Dataset>
Dimensions: (time: 368, x: 1200, y: 1200)
Coordinates:
* time (time) datetime64[ns] 1998-01-02 1998-01-10 ... 2005-12-28
Dimensions without coordinates: x, y
Data variables:
latitude (y, x) float32 ...
longitude (y, x) float32 ...
Data_Mask (y, x) float32 ...
BHR_SW (time, y, x) float32 ...
Attributes:
CDI: Climate Data Interface version 1.9.5 (http://mpimet.mp...
Conventions: CF-1.4
history: Fri Dec 07 13:29:13 2018: cdo mergetime GLOBALBEDO/Glo...
content: extracted variabel BHR_SW of the original GlobAlbedo (...
metadata_profile: beam
metadata_version: 0.5
CDO: Climate Data Operators version 1.9.5 (http://mpimet.mp...
```
我试过类似的东西:
ds.resample(x=200).mean()
/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/xarray/core/common.pyc in resample(self, indexer, skipna, closed, label, base, keep_attrs, **indexer_kwargs)
686 dim_coord = self[dim]
687
--> 688 if isinstance(self.indexes[dim_name], CFTimeIndex):
689 raise NotImplementedError(
690 'Resample is currently not supported along a dimension '
/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/xarray/core/coordinates.pyc in __getitem__(self, key)
309 if key not in self._sizes:
310 raise KeyError(key)
--> 311 return self._variables[key].to_index()
312
313 def __unicode__(self):
KeyError: 'x'
非常感谢任何帮助。
要使用 xarray
最明显的方法是使用 groupby_bins
,但事实证明这非常慢。进入 numpy
并使用超快索引 ([:, :, frequency]
)
可能更有效
nsamples = 200
bins = np.linspace(output_ds.x.min(),
output_ds.x.max(), nsamples).astype(int)
output_ds = output_ds.groupby_bins('x', bins).first()
更新
@clausmichele 的 using coarsen
现在是执行此操作的最佳方法。请注意,粗化现在包括指定所需输出坐标的能力。
原版post
正如 所暗示的那样,groupby 是在 xarray 中执行此操作的唯一方法。重采样只能用于日期时间坐标。
由于 xarray 目前不处理多维 groupby,这必须分两个阶段完成:
# this results in bin centers on 100, 300, ...
reduced = (
output_ds
.groupby(((output_ds.x//200) + 0.5) * 200)
.mean(dim='x')
.groupby(((output_ds.y//200) + 0.5) * 200)
.mean(dim='y'))
如果您只是想对数据进行下采样,可以使用位置切片:
output_ds[:, ::200, ::200]
或者,使用命名的 dims:
output_ds[{'x': slice(None, None, 200), 'y': slice(None, None, 200)}]
最后,还有其他软件包专门设计用于与 xarray 兼容的快速重新网格化。 xESMF 不错。
由于您使用的是已使用 CDO 操作的 NetCDF
文件,因此您还可以使用 CDO SAMPLEGRID
函数或 NCO bilinear_interp
函数:
SAMPLEGRID
(https://code.mpimet.mpg.de/projects/cdo/embedded/cdo.pdf) 不插值,它只是删除每第 n 个网格点。
bilinear_interp
(http://nco.sourceforge.net/nco.html#Bilinear-interpolation) 进行插值。
正如您可能想要的平均值、最大值,无论反照率值是多少,您可能更喜欢 NCO bilinear_interp
。但是 CDO SAMPLEGRID
可以为您提供 NOC bilinear_interp
.
所需的 grid_out
最近,coarsen 方法已添加到 xarray 中,我认为这是空间下采样的最佳方法,即使无法使用它来设置所需的最终分辨率并自动计算它。
Coarsen 将在 non-overlapping windows 上执行操作(平均值、最大值、最小值等),根据您设置的 window 大小,您将获得所需的最终分辨率。
来自作者的原始输入数据:
import pandas as pd
import numpy as np
import xarray as xr
time = pd.date_range(np.datetime64('1998-01-02T00:00:00.000000000'), np.datetime64('2005-12-28T00:00:00.000000000'), freq='8D')
x = np.arange(1200)
y = np.arange(1200)
latitude = np.linspace(40,50,1200)
longitude = np.linspace(0,15.5572382,1200)
latitude, longitude = np.meshgrid(latitude, longitude)
BHR_SW = np.ones((365, 1200, 1200))
output_da = xr.DataArray(BHR_SW, coords=[time, y, x])
latitude_da = xr.DataArray(latitude, coords=[y, x])
longitude_da = xr.DataArray(longitude, coords=[y, x])
output_da = output_da.rename({'dim_0':'time','dim_1':'y','dim_2':'x'})
latitude_da = latitude_da.rename({'dim_0':'y','dim_1':'x'})
longitude_da = longitude_da.rename({'dim_0':'y','dim_1':'x'})
output_ds = output_da.to_dataset(name='BHR_SW')
output_ds = output_ds.assign({'latitude':latitude_da, 'longitude':longitude_da})
print(output_ds)
<xarray.Dataset>
Dimensions: (time: 365, x: 1200, y: 1200)
Coordinates:
* time (time) datetime64[ns] 1998-01-02 1998-01-10 ... 2005-12-23
* y (y) int64 0 1 2 3 4 5 6 7 ... 1193 1194 1195 1196 1197 1198 1199
* x (x) int64 0 1 2 3 4 5 6 7 ... 1193 1194 1195 1196 1197 1198 1199
Data variables:
BHR_SW (time, y, x) float64 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0
latitude (y, x) float64 40.0 40.01 40.02 40.03 ... 49.97 49.98 49.99 50.0
longitude (y, x) float64 0.0 0.0 0.0 0.0 0.0 ... 15.56 15.56 15.56 15.56
Coarsen 方法将空间分辨率从 1200x1200 降低到 200x200,我们需要 6x6 windows。
output_ds.coarsen(x=6).mean().coarsen(y=6).mean()
# or output_ds.coarsen(x=6,y=6).mean()
<xarray.Dataset>
Dimensions: (time: 365, x: 200, y: 200)
Coordinates:
* time (time) datetime64[ns] 1998-01-02 1998-01-10 ... 2005-12-23
* y (y) float64 2.5 8.5 14.5 20.5 ... 1.184e+03 1.19e+03 1.196e+03
* x (x) float64 2.5 8.5 14.5 20.5 ... 1.184e+03 1.19e+03 1.196e+03
Data variables:
BHR_SW (time, y, x) float64 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0
latitude (y, x) float64 40.02 40.07 40.12 40.17 ... 49.88 49.93 49.98
longitude (y, x) float64 0.03244 0.03244 0.03244 ... 15.52 15.52 15.52
使用 xarray 重新采样以降低空间分辨率
我想将我的 xarray 对象重新采样到较低的空间分辨率(LESS PIXELS)。
import pandas as pd
import numpy as np
import xarray as xr
time = pd.date_range(np.datetime64('1998-01-02T00:00:00.000000000'), np.datetime64('2005-12-28T00:00:00.000000000'), freq='8D')
x = np.arange(1200)
y = np.arange(1200)
latitude = np.linspace(40,50,1200)
longitude = np.linspace(0,15.5572382,1200)
latitude, longitude = np.meshgrid(latitude, longitude)
BHR_SW = np.ones((365, 1200, 1200))
output_da = xr.DataArray(BHR_SW, coords=[time, y, x])
latitude_da = xr.DataArray(latitude, coords=[y, x])
longitude_da = xr.DataArray(longitude, coords=[y, x])
output_da = output_da.rename({'dim_0':'time','dim_1':'y','dim_2':'x'})
latitude_da = latitude_da.rename({'dim_0':'y','dim_1':'x'})
longitude_da = longitude_da.rename({'dim_0':'y','dim_1':'x'})
output_ds = output_da.to_dataset(name='BHR_SW')
output_ds = output_ds.assign({'latitude':latitude_da, 'longitude':longitude_da})
print(output_ds)
<xarray.Dataset>
Dimensions: (time: 365, x: 1200, y: 1200)
Coordinates:
* time (time) datetime64[ns] 1998-01-02 1998-01-10 ... 2005-12-23
* y (y) int64 0 1 2 3 4 5 6 7 ... 1193 1194 1195 1196 1197 1198 1199
* x (x) int64 0 1 2 3 4 5 6 7 ... 1193 1194 1195 1196 1197 1198 1199
Data variables:
BHR_SW (time, y, x) float64 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0
latitude (y, x) float64 40.0 40.01 40.02 40.03 ... 49.97 49.98 49.99 50.0
longitude (y, x) float64 0.0 0.0 0.0 0.0 0.0 ... 15.56 15.56 15.56 15.56
```
我的问题是,如何通过 x,y 坐标将以下内容重新采样为 200x200 网格?
这是在降低变量的空间分辨率。
我试过的是:
output_ds.resample(x=200).mean()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-54-10fbdf855a5d> in <module>()
----> 1 output_ds.resample(x=200).mean()
/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/xarray/core/common.pyc in resample(self, indexer, skipna, closed, label, base, keep_attrs, **indexer_kwargs)
701 group = DataArray(dim_coord, coords=dim_coord.coords,
702 dims=dim_coord.dims, name=RESAMPLE_DIM)
--> 703 grouper = pd.Grouper(freq=freq, closed=closed, label=label, base=base)
704 resampler = self._resample_cls(self, group=group, dim=dim_name,
705 grouper=grouper,
/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/pandas/core/resample.pyc in __init__(self, freq, closed, label, how, axis, fill_method, limit, loffset, kind, convention, base, **kwargs)
1198 .format(convention))
1199
-> 1200 freq = to_offset(freq)
1201
1202 end_types = set(['M', 'A', 'Q', 'BM', 'BA', 'BQ', 'W'])
/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/pandas/tseries/frequencies.pyc in to_offset(freq)
174 delta = delta + offset
175 except Exception:
--> 176 raise ValueError(libfreqs._INVALID_FREQ_ERROR.format(freq))
177
178 if delta is None:
ValueError: Invalid frequency: 200
但是我得到了显示的错误。
如何完成 x 和 y 的空间重采样?
理想情况下我想这样做:
output_ds.resample(x=200, y=200).mean()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-55-e0bfce19e037> in <module>()
----> 1 output_ds.resample(x=200, y=200).mean()
/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/xarray/core/common.pyc in resample(self, indexer, skipna, closed, label, base, keep_attrs, **indexer_kwargs)
679 if len(indexer) != 1:
680 raise ValueError(
--> 681 "Resampling only supported along single dimensions."
682 )
683 dim, freq = indexer.popitem()
ValueError: Resampling only supported along single dimensions.
注意:真实数据有不同的行为
这是我在上面创建的测试数据。关于从netcdf文件读入的真实数据
<xarray.Dataset>
Dimensions: (time: 368, x: 1200, y: 1200)
Coordinates:
* time (time) datetime64[ns] 1998-01-02 1998-01-10 ... 2005-12-28
Dimensions without coordinates: x, y
Data variables:
latitude (y, x) float32 ...
longitude (y, x) float32 ...
Data_Mask (y, x) float32 ...
BHR_SW (time, y, x) float32 ...
Attributes:
CDI: Climate Data Interface version 1.9.5 (http://mpimet.mp...
Conventions: CF-1.4
history: Fri Dec 07 13:29:13 2018: cdo mergetime GLOBALBEDO/Glo...
content: extracted variabel BHR_SW of the original GlobAlbedo (...
metadata_profile: beam
metadata_version: 0.5
CDO: Climate Data Operators version 1.9.5 (http://mpimet.mp...
```
我试过类似的东西:
ds.resample(x=200).mean()
/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/xarray/core/common.pyc in resample(self, indexer, skipna, closed, label, base, keep_attrs, **indexer_kwargs)
686 dim_coord = self[dim]
687
--> 688 if isinstance(self.indexes[dim_name], CFTimeIndex):
689 raise NotImplementedError(
690 'Resample is currently not supported along a dimension '
/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/xarray/core/coordinates.pyc in __getitem__(self, key)
309 if key not in self._sizes:
310 raise KeyError(key)
--> 311 return self._variables[key].to_index()
312
313 def __unicode__(self):
KeyError: 'x'
非常感谢任何帮助。
要使用 xarray
最明显的方法是使用 groupby_bins
,但事实证明这非常慢。进入 numpy
并使用超快索引 ([:, :, frequency]
)
nsamples = 200
bins = np.linspace(output_ds.x.min(),
output_ds.x.max(), nsamples).astype(int)
output_ds = output_ds.groupby_bins('x', bins).first()
更新
@clausmichele 的 coarsen
现在是执行此操作的最佳方法。请注意,粗化现在包括指定所需输出坐标的能力。
原版post
正如
由于 xarray 目前不处理多维 groupby,这必须分两个阶段完成:
# this results in bin centers on 100, 300, ...
reduced = (
output_ds
.groupby(((output_ds.x//200) + 0.5) * 200)
.mean(dim='x')
.groupby(((output_ds.y//200) + 0.5) * 200)
.mean(dim='y'))
如果您只是想对数据进行下采样,可以使用位置切片:
output_ds[:, ::200, ::200]
或者,使用命名的 dims:
output_ds[{'x': slice(None, None, 200), 'y': slice(None, None, 200)}]
最后,还有其他软件包专门设计用于与 xarray 兼容的快速重新网格化。 xESMF 不错。
由于您使用的是已使用 CDO 操作的 NetCDF
文件,因此您还可以使用 CDO SAMPLEGRID
函数或 NCO bilinear_interp
函数:
SAMPLEGRID
(https://code.mpimet.mpg.de/projects/cdo/embedded/cdo.pdf) 不插值,它只是删除每第 n 个网格点。
bilinear_interp
(http://nco.sourceforge.net/nco.html#Bilinear-interpolation) 进行插值。
正如您可能想要的平均值、最大值,无论反照率值是多少,您可能更喜欢 NCO bilinear_interp
。但是 CDO SAMPLEGRID
可以为您提供 NOC bilinear_interp
.
grid_out
最近,coarsen 方法已添加到 xarray 中,我认为这是空间下采样的最佳方法,即使无法使用它来设置所需的最终分辨率并自动计算它。 Coarsen 将在 non-overlapping windows 上执行操作(平均值、最大值、最小值等),根据您设置的 window 大小,您将获得所需的最终分辨率。
来自作者的原始输入数据:
import pandas as pd
import numpy as np
import xarray as xr
time = pd.date_range(np.datetime64('1998-01-02T00:00:00.000000000'), np.datetime64('2005-12-28T00:00:00.000000000'), freq='8D')
x = np.arange(1200)
y = np.arange(1200)
latitude = np.linspace(40,50,1200)
longitude = np.linspace(0,15.5572382,1200)
latitude, longitude = np.meshgrid(latitude, longitude)
BHR_SW = np.ones((365, 1200, 1200))
output_da = xr.DataArray(BHR_SW, coords=[time, y, x])
latitude_da = xr.DataArray(latitude, coords=[y, x])
longitude_da = xr.DataArray(longitude, coords=[y, x])
output_da = output_da.rename({'dim_0':'time','dim_1':'y','dim_2':'x'})
latitude_da = latitude_da.rename({'dim_0':'y','dim_1':'x'})
longitude_da = longitude_da.rename({'dim_0':'y','dim_1':'x'})
output_ds = output_da.to_dataset(name='BHR_SW')
output_ds = output_ds.assign({'latitude':latitude_da, 'longitude':longitude_da})
print(output_ds)
<xarray.Dataset>
Dimensions: (time: 365, x: 1200, y: 1200)
Coordinates:
* time (time) datetime64[ns] 1998-01-02 1998-01-10 ... 2005-12-23
* y (y) int64 0 1 2 3 4 5 6 7 ... 1193 1194 1195 1196 1197 1198 1199
* x (x) int64 0 1 2 3 4 5 6 7 ... 1193 1194 1195 1196 1197 1198 1199
Data variables:
BHR_SW (time, y, x) float64 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0
latitude (y, x) float64 40.0 40.01 40.02 40.03 ... 49.97 49.98 49.99 50.0
longitude (y, x) float64 0.0 0.0 0.0 0.0 0.0 ... 15.56 15.56 15.56 15.56
Coarsen 方法将空间分辨率从 1200x1200 降低到 200x200,我们需要 6x6 windows。
output_ds.coarsen(x=6).mean().coarsen(y=6).mean()
# or output_ds.coarsen(x=6,y=6).mean()
<xarray.Dataset>
Dimensions: (time: 365, x: 200, y: 200)
Coordinates:
* time (time) datetime64[ns] 1998-01-02 1998-01-10 ... 2005-12-23
* y (y) float64 2.5 8.5 14.5 20.5 ... 1.184e+03 1.19e+03 1.196e+03
* x (x) float64 2.5 8.5 14.5 20.5 ... 1.184e+03 1.19e+03 1.196e+03
Data variables:
BHR_SW (time, y, x) float64 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0
latitude (y, x) float64 40.02 40.07 40.12 40.17 ... 49.88 49.93 49.98
longitude (y, x) float64 0.03244 0.03244 0.03244 ... 15.52 15.52 15.52