如何使用 groupby 为 xarray 数据集添加新变量并应用?
How to add new variables for an xarray dataset using groupby and apply?
我在理解 xarray.groupby 的真正工作原理方面遇到了严重困难。我正在尝试对 xarray DatasetGroupBy 集合的每个组应用给定函数“f”,这样“f”应该向原始 xr.DataSet.
的每个应用组添加新变量
简单介绍一下:
我的问题常见于地学、遥感等领域
目标是逐个像素(或逐个网格)在数组上应用给定函数。
例子
假设我想评估给定区域风场相对于新方向的风速分量 (u,v)。因此,我想评估 'u' 和 'v 组件的旋转版本,即:u_rotated 和 v_rotated.
让我们假设这个新方向相对于风场中的每个像素位置逆时针旋转 30°。所以新的风分量将是 (u_30_degrees and v_30_degrees).
我的第一次尝试是将每个 x 和 y 坐标(或经度和纬度)堆叠到一个称为像素的新维度中,然后按这个新维度(“像素”)分组并应用一个函数矢量风旋转。
这是我最初尝试的片段:
# First, let's create some functions for vector rotation:
def rotate_2D_vector_per_given_degrees(array2D, angle=30):
'''
Parameters
----------
array2D : 1D length 2 numpy array
angle : float angle in degrees (optional)
DESCRIPTION. The default is 30.
Returns
-------
Rotated_2D_Vector : 1D of length 2 numpy array
'''
R = get_rotation_matrix(rotation = angle)
Rotated_2D_Vector = np.dot(R, array2D)
return Rotated_2D_Vector
def get_rotation_matrix(rotation=90):
'''
Description:
This function creates a rotation matrix given a defined rotation angle (in degrees)
Parameters:
rotation: in degrees
Returns:
rotation matrix
'''
theta = np.radians(rotation) # degrees
c, s = np.cos(theta), np.sin(theta)
R = np.array(((c, -s), (s, c)))
return R
# Then let's create a reproducible dataset for analysis:
u_wind = xr.DataArray(np.ones( shape=(20, 30)),
dims=('x', 'y'),
coords={'x': np.arange(0, 20),
'y': np.arange(0, 30)},
name='u')
v_wind = xr.DataArray(np.ones( shape=(20, 30))*0.3,
dims=('x', 'y'),
coords={'x': np.arange(0, 20),
'y': np.arange(0, 30)},
name='v')
data = xr.merge([u_wind, v_wind])
# Let's create the given function that will be applied per each group in the dataset:
def rotate_wind(array, degrees=30):
# This next line, I create a 1-dimension vector of length 2,
# with wind speed of the u and v components, respectively.
# The best solution I found has been conver the dataset into a single xr.DataArray
# by stacking the 'u' and 'v' components into a single variable named 'wind'.
vector = array.to_array(dim='wind').values
# Now, I rotate the wind vector given a rotation angle in degrees
Rotated = rotate_2D_vector_per_given_degrees(vector, degrees)
# Ensuring numerical division problems as 1e-17 == 0.
Rotated = np.where( np.abs(Rotated - 6.123234e-15) < 1e-15, 0, Rotated)
# sanity check for each point position:
print('Coords: ', array['point'].values,
'Wind Speed: ', vector,
'Response :', Rotated,
end='\n\n'+'-'*20+'\n')
components = [a for a in data.variables if a not in data.dims]
for dim, value in zip(components, Rotated):
array['{0}_rotated_{1}'.format(dim, degrees)] = value
return array
# Finally, lets stack our dataset per grid-point, groupby this new dimension, and apply the desired function:
stacked = data.stack(point = ['x', 'y'])
stacked = stacked.groupby('point').apply(rotate_wind)
# lets unstack the data to recover the original dataset:
data = stacked.unstack('point')
# Let's check if the function worked correctly
data.to_dataframe().head(30)
虽然上面的例子显然有效,但我仍然不确定它的结果是否正确,或者即使 groupby-apply 函数实现是否有效(干净、非冗余、快速等)。
欢迎任何见解!
此致,
您只需将整个数组乘以旋转矩阵即可,例如 np.dot(R, da)
。
所以,如果你有以下 Dataset
:
>>> dims = ("x", "y")
>>> sizes = (20, 30)
>>> ds = xr.Dataset(
data_vars=dict(u=(dims, np.ones(sizes)), v=(dims, np.ones(sizes) * 0.3)),
coords={d: np.arange(s) for d, s in zip(dims, sizes)},
)
>>> ds
<xarray.Dataset>
Dimensions: (x: 20, y: 30)
Coordinates:
* x (x) int64 0 1 2 3 4 ... 16 17 18 19
* y (y) int64 0 1 2 3 4 ... 26 27 28 29
Data variables:
u (x, y) float64 1.0 1.0 ... 1.0 1.0
v (x, y) float64 0.3 0.3 ... 0.3 0.3
像您一样转换为以下 DataArray
:
>>> da = ds.stack(point=["x", "y"]).to_array(dim="wind")
>>> da
<xarray.DataArray (wind: 2, point: 600)>
array([[1. , 1. , 1. , ..., 1. , 1. , 1. ],
[0.3, 0.3, 0.3, ..., 0.3, 0.3, 0.3]])
Coordinates:
* point (point) MultiIndex
- x (point) int64 0 0 0 0 ... 19 19 19 19
- y (point) int64 0 1 2 3 ... 26 27 28 29
* wind (wind) <U1 'u' 'v'
然后,由于 np.dot(R, da)
:
,您得到了旋转值
>>> np.dot(R, da).shape
(2, 600)
>>> type(np.dot(R, da))
<class 'numpy.ndarray'>
但它是一个 numpy ndarray
。所以如果你想回到 xarray DataArray
,你可以使用这样的技巧(可能存在其他解决方案):
def rotate(da, dim, angle):
# Put dim first
dims_orig = da.dims
da = da.transpose(dim, ...)
# Rotate
R = rotation_matrix(angle)
rotated = da.copy(data=np.dot(R, da), deep=True)
# Rename values of "dim" coord according to rotation
rotated[dim] = [f"{orig}_rotated_{angle}" for orig in da[dim].values]
# Transpose back to orig
return rotated.transpose(*dims_orig)
并像这样使用它:
>>> da_rotated = rotate(da, dim="wind", angle=30)
>>> da_rotated
<xarray.DataArray (wind: 2, point: 600)>
array([[0.7160254 , 0.7160254 , 0.7160254 , ..., 0.7160254 , 0.7160254 ,
0.7160254 ],
[0.75980762, 0.75980762, 0.75980762, ..., 0.75980762, 0.75980762,
0.75980762]])
Coordinates:
* point (point) MultiIndex
- x (point) int64 0 0 0 0 ... 19 19 19 19
- y (point) int64 0 1 2 3 ... 26 27 28 29
* wind (wind) <U12 'u_rotated_30' 'v_rotated_30'
最终,您可以回到原来的 Dataset
结构:
>>> ds_rotated = da_rotated.to_dataset(dim="wind").unstack(dim="point")
>>> ds_rotated
<xarray.Dataset>
Dimensions: (x: 20, y: 30)
Coordinates:
* x (x) int64 0 1 2 3 ... 17 18 19
* y (y) int64 0 1 2 3 ... 27 28 29
Data variables:
u_rotated_30 (x, y) float64 0.716 ... 0.716
v_rotated_30 (x, y) float64 0.7598 ... 0.7598
我在理解 xarray.groupby 的真正工作原理方面遇到了严重困难。我正在尝试对 xarray DatasetGroupBy 集合的每个组应用给定函数“f”,这样“f”应该向原始 xr.DataSet.
的每个应用组添加新变量简单介绍一下:
我的问题常见于地学、遥感等领域
目标是逐个像素(或逐个网格)在数组上应用给定函数。
例子
假设我想评估给定区域风场相对于新方向的风速分量 (u,v)。因此,我想评估 'u' 和 'v 组件的旋转版本,即:u_rotated 和 v_rotated.
让我们假设这个新方向相对于风场中的每个像素位置逆时针旋转 30°。所以新的风分量将是 (u_30_degrees and v_30_degrees).
我的第一次尝试是将每个 x 和 y 坐标(或经度和纬度)堆叠到一个称为像素的新维度中,然后按这个新维度(“像素”)分组并应用一个函数矢量风旋转。
这是我最初尝试的片段:
# First, let's create some functions for vector rotation:
def rotate_2D_vector_per_given_degrees(array2D, angle=30):
'''
Parameters
----------
array2D : 1D length 2 numpy array
angle : float angle in degrees (optional)
DESCRIPTION. The default is 30.
Returns
-------
Rotated_2D_Vector : 1D of length 2 numpy array
'''
R = get_rotation_matrix(rotation = angle)
Rotated_2D_Vector = np.dot(R, array2D)
return Rotated_2D_Vector
def get_rotation_matrix(rotation=90):
'''
Description:
This function creates a rotation matrix given a defined rotation angle (in degrees)
Parameters:
rotation: in degrees
Returns:
rotation matrix
'''
theta = np.radians(rotation) # degrees
c, s = np.cos(theta), np.sin(theta)
R = np.array(((c, -s), (s, c)))
return R
# Then let's create a reproducible dataset for analysis:
u_wind = xr.DataArray(np.ones( shape=(20, 30)),
dims=('x', 'y'),
coords={'x': np.arange(0, 20),
'y': np.arange(0, 30)},
name='u')
v_wind = xr.DataArray(np.ones( shape=(20, 30))*0.3,
dims=('x', 'y'),
coords={'x': np.arange(0, 20),
'y': np.arange(0, 30)},
name='v')
data = xr.merge([u_wind, v_wind])
# Let's create the given function that will be applied per each group in the dataset:
def rotate_wind(array, degrees=30):
# This next line, I create a 1-dimension vector of length 2,
# with wind speed of the u and v components, respectively.
# The best solution I found has been conver the dataset into a single xr.DataArray
# by stacking the 'u' and 'v' components into a single variable named 'wind'.
vector = array.to_array(dim='wind').values
# Now, I rotate the wind vector given a rotation angle in degrees
Rotated = rotate_2D_vector_per_given_degrees(vector, degrees)
# Ensuring numerical division problems as 1e-17 == 0.
Rotated = np.where( np.abs(Rotated - 6.123234e-15) < 1e-15, 0, Rotated)
# sanity check for each point position:
print('Coords: ', array['point'].values,
'Wind Speed: ', vector,
'Response :', Rotated,
end='\n\n'+'-'*20+'\n')
components = [a for a in data.variables if a not in data.dims]
for dim, value in zip(components, Rotated):
array['{0}_rotated_{1}'.format(dim, degrees)] = value
return array
# Finally, lets stack our dataset per grid-point, groupby this new dimension, and apply the desired function:
stacked = data.stack(point = ['x', 'y'])
stacked = stacked.groupby('point').apply(rotate_wind)
# lets unstack the data to recover the original dataset:
data = stacked.unstack('point')
# Let's check if the function worked correctly
data.to_dataframe().head(30)
虽然上面的例子显然有效,但我仍然不确定它的结果是否正确,或者即使 groupby-apply 函数实现是否有效(干净、非冗余、快速等)。
欢迎任何见解!
此致,
您只需将整个数组乘以旋转矩阵即可,例如 np.dot(R, da)
。
所以,如果你有以下 Dataset
:
>>> dims = ("x", "y")
>>> sizes = (20, 30)
>>> ds = xr.Dataset(
data_vars=dict(u=(dims, np.ones(sizes)), v=(dims, np.ones(sizes) * 0.3)),
coords={d: np.arange(s) for d, s in zip(dims, sizes)},
)
>>> ds
<xarray.Dataset>
Dimensions: (x: 20, y: 30)
Coordinates:
* x (x) int64 0 1 2 3 4 ... 16 17 18 19
* y (y) int64 0 1 2 3 4 ... 26 27 28 29
Data variables:
u (x, y) float64 1.0 1.0 ... 1.0 1.0
v (x, y) float64 0.3 0.3 ... 0.3 0.3
像您一样转换为以下 DataArray
:
>>> da = ds.stack(point=["x", "y"]).to_array(dim="wind")
>>> da
<xarray.DataArray (wind: 2, point: 600)>
array([[1. , 1. , 1. , ..., 1. , 1. , 1. ],
[0.3, 0.3, 0.3, ..., 0.3, 0.3, 0.3]])
Coordinates:
* point (point) MultiIndex
- x (point) int64 0 0 0 0 ... 19 19 19 19
- y (point) int64 0 1 2 3 ... 26 27 28 29
* wind (wind) <U1 'u' 'v'
然后,由于 np.dot(R, da)
:
>>> np.dot(R, da).shape
(2, 600)
>>> type(np.dot(R, da))
<class 'numpy.ndarray'>
但它是一个 numpy ndarray
。所以如果你想回到 xarray DataArray
,你可以使用这样的技巧(可能存在其他解决方案):
def rotate(da, dim, angle):
# Put dim first
dims_orig = da.dims
da = da.transpose(dim, ...)
# Rotate
R = rotation_matrix(angle)
rotated = da.copy(data=np.dot(R, da), deep=True)
# Rename values of "dim" coord according to rotation
rotated[dim] = [f"{orig}_rotated_{angle}" for orig in da[dim].values]
# Transpose back to orig
return rotated.transpose(*dims_orig)
并像这样使用它:
>>> da_rotated = rotate(da, dim="wind", angle=30)
>>> da_rotated
<xarray.DataArray (wind: 2, point: 600)>
array([[0.7160254 , 0.7160254 , 0.7160254 , ..., 0.7160254 , 0.7160254 ,
0.7160254 ],
[0.75980762, 0.75980762, 0.75980762, ..., 0.75980762, 0.75980762,
0.75980762]])
Coordinates:
* point (point) MultiIndex
- x (point) int64 0 0 0 0 ... 19 19 19 19
- y (point) int64 0 1 2 3 ... 26 27 28 29
* wind (wind) <U12 'u_rotated_30' 'v_rotated_30'
最终,您可以回到原来的 Dataset
结构:
>>> ds_rotated = da_rotated.to_dataset(dim="wind").unstack(dim="point")
>>> ds_rotated
<xarray.Dataset>
Dimensions: (x: 20, y: 30)
Coordinates:
* x (x) int64 0 1 2 3 ... 17 18 19
* y (y) int64 0 1 2 3 ... 27 28 29
Data variables:
u_rotated_30 (x, y) float64 0.716 ... 0.716
v_rotated_30 (x, y) float64 0.7598 ... 0.7598