如何通过折叠坐标重塑 xarray 数据集
How to reshape xarray dataset by collapsing coordinate
我目前有一个数据集,当用 xarray 打开时包含三个坐标 x, y, band
。波段坐标在 4 个不同的时间间隔内分别具有温度和露点,这意味着总共有 8 个波段。有没有办法重塑它,这样我就可以 x, y, band, time
这样带坐标现在只有长度 2 而时间坐标长度为 4?
我想我可以添加一个名为 time
的新坐标,然后在 but
中添加波段
ds = ds.assign_coords(time=[1,2,3,4])
returns ValueError: cannot add coordinates with new dimensions to a DataArray
.
您可以 re-assign 将“带”坐标 MultiIndex
:
In [4]: da = xr.DataArray(np.random.random((4, 4, 8)), dims=['x', 'y', 'band'])
In [5]: da.coords['band'] = pd.MultiIndex.from_arrays(
...: [
...: [1, 1, 1, 1, 2, 2, 2, 2],
...: pd.to_datetime(['2020-01-01', '2021-01-01', '2022-01-01', '2023-01-01'] * 2),
...: ],
...: names=['band_stacked', 'time'],
...: )
In [6]: stacked
Out[6]:
<xarray.DataArray (x: 4, y: 4, band: 8)>
array([[[2.55228052e-01, 6.71680777e-01, 8.76158643e-01, 5.23808010e-01,
8.56941412e-01, 2.75757101e-01, 7.88877551e-02, 1.54739786e-02],
[3.70350510e-01, 1.90604842e-02, 2.17871931e-01, 9.40704074e-01,
4.28769745e-02, 9.24407375e-01, 2.81715762e-01, 9.12889594e-01],
[7.36529770e-02, 1.53507827e-01, 2.83341417e-01, 3.00687140e-01,
7.41822972e-01, 6.82413237e-01, 7.92126231e-01, 4.84821281e-01],
[5.24897891e-01, 4.69537663e-01, 2.47668326e-01, 7.56147251e-02,
6.27767921e-01, 2.70630355e-01, 5.44669493e-01, 3.53063860e-01]],
...
[[1.56513994e-02, 8.49568142e-01, 3.67268562e-01, 7.28406400e-01,
2.82383223e-01, 5.00901504e-01, 9.99643260e-01, 1.16446139e-01],
[9.98980637e-01, 2.45060112e-02, 8.12423749e-01, 4.49895624e-01,
6.64880037e-01, 8.73506549e-01, 1.79186788e-01, 1.94347924e-01],
[6.32000394e-01, 7.60414128e-01, 4.90153658e-01, 3.40693056e-01,
5.19820559e-01, 4.49398587e-01, 1.90339730e-01, 6.38101614e-02],
[7.64102189e-01, 6.79961676e-01, 7.63165470e-01, 6.23766131e-02,
5.62677420e-01, 3.85784911e-01, 4.43436365e-01, 2.44385584e-01]]])
Coordinates:
* band (band) MultiIndex
- band_stacked (band) int64 1 1 1 1 2 2 2 2
- time (band) datetime64[ns] 2020-01-01 2021-01-01 ... 2023-01-01
Dimensions without coordinates: x, y
然后可以通过拆栈来扩维:
In [7]: unstacked
Out[7]:
<xarray.DataArray (x: 4, y: 4, band: 2, time: 4)>
array([[[[2.55228052e-01, 6.71680777e-01, 8.76158643e-01,
5.23808010e-01],
[8.56941412e-01, 2.75757101e-01, 7.88877551e-02,
1.54739786e-02]],
...
[[7.64102189e-01, 6.79961676e-01, 7.63165470e-01,
6.23766131e-02],
[5.62677420e-01, 3.85784911e-01, 4.43436365e-01,
2.44385584e-01]]]])
Coordinates:
* band (band) int64 1 2
* time (time) datetime64[ns] 2020-01-01 2021-01-01 2022-01-01 2023-01-01
Dimensions without coordinates: x, y
另一个更手动的选项是在 numpy 中重塑并创建一个新的 DataArray。请注意,对于较大的数组,此手动重塑 快 很多:
In [8]: reshaped = xr.DataArray(
...: da.data.reshape((4, 4, 2, 4)),
...: dims=['x', 'y', 'band', 'time'],
...: coords={
...: 'time': pd.to_datetime(['2020-01-01', '2021-01-01', '2022-01-01', '2023-01-01']),
...: 'band': [1, 2],
...: },
...: )
请注意,如果您的数据是分块的(假设您希望保持这种方式),您的选择会更加有限 - 请参阅 reshaping dask arrays. The first (MultiIndexing unstack) approach does work with dask arrays as long as the arrays are not chunked along the unstacked dimension. See 上的 dask 文档作为示例。
我目前有一个数据集,当用 xarray 打开时包含三个坐标 x, y, band
。波段坐标在 4 个不同的时间间隔内分别具有温度和露点,这意味着总共有 8 个波段。有没有办法重塑它,这样我就可以 x, y, band, time
这样带坐标现在只有长度 2 而时间坐标长度为 4?
我想我可以添加一个名为 time
的新坐标,然后在 but
ds = ds.assign_coords(time=[1,2,3,4])
returns ValueError: cannot add coordinates with new dimensions to a DataArray
.
您可以 re-assign 将“带”坐标 MultiIndex
:
In [4]: da = xr.DataArray(np.random.random((4, 4, 8)), dims=['x', 'y', 'band'])
In [5]: da.coords['band'] = pd.MultiIndex.from_arrays(
...: [
...: [1, 1, 1, 1, 2, 2, 2, 2],
...: pd.to_datetime(['2020-01-01', '2021-01-01', '2022-01-01', '2023-01-01'] * 2),
...: ],
...: names=['band_stacked', 'time'],
...: )
In [6]: stacked
Out[6]:
<xarray.DataArray (x: 4, y: 4, band: 8)>
array([[[2.55228052e-01, 6.71680777e-01, 8.76158643e-01, 5.23808010e-01,
8.56941412e-01, 2.75757101e-01, 7.88877551e-02, 1.54739786e-02],
[3.70350510e-01, 1.90604842e-02, 2.17871931e-01, 9.40704074e-01,
4.28769745e-02, 9.24407375e-01, 2.81715762e-01, 9.12889594e-01],
[7.36529770e-02, 1.53507827e-01, 2.83341417e-01, 3.00687140e-01,
7.41822972e-01, 6.82413237e-01, 7.92126231e-01, 4.84821281e-01],
[5.24897891e-01, 4.69537663e-01, 2.47668326e-01, 7.56147251e-02,
6.27767921e-01, 2.70630355e-01, 5.44669493e-01, 3.53063860e-01]],
...
[[1.56513994e-02, 8.49568142e-01, 3.67268562e-01, 7.28406400e-01,
2.82383223e-01, 5.00901504e-01, 9.99643260e-01, 1.16446139e-01],
[9.98980637e-01, 2.45060112e-02, 8.12423749e-01, 4.49895624e-01,
6.64880037e-01, 8.73506549e-01, 1.79186788e-01, 1.94347924e-01],
[6.32000394e-01, 7.60414128e-01, 4.90153658e-01, 3.40693056e-01,
5.19820559e-01, 4.49398587e-01, 1.90339730e-01, 6.38101614e-02],
[7.64102189e-01, 6.79961676e-01, 7.63165470e-01, 6.23766131e-02,
5.62677420e-01, 3.85784911e-01, 4.43436365e-01, 2.44385584e-01]]])
Coordinates:
* band (band) MultiIndex
- band_stacked (band) int64 1 1 1 1 2 2 2 2
- time (band) datetime64[ns] 2020-01-01 2021-01-01 ... 2023-01-01
Dimensions without coordinates: x, y
然后可以通过拆栈来扩维:
In [7]: unstacked
Out[7]:
<xarray.DataArray (x: 4, y: 4, band: 2, time: 4)>
array([[[[2.55228052e-01, 6.71680777e-01, 8.76158643e-01,
5.23808010e-01],
[8.56941412e-01, 2.75757101e-01, 7.88877551e-02,
1.54739786e-02]],
...
[[7.64102189e-01, 6.79961676e-01, 7.63165470e-01,
6.23766131e-02],
[5.62677420e-01, 3.85784911e-01, 4.43436365e-01,
2.44385584e-01]]]])
Coordinates:
* band (band) int64 1 2
* time (time) datetime64[ns] 2020-01-01 2021-01-01 2022-01-01 2023-01-01
Dimensions without coordinates: x, y
另一个更手动的选项是在 numpy 中重塑并创建一个新的 DataArray。请注意,对于较大的数组,此手动重塑 快 很多:
In [8]: reshaped = xr.DataArray(
...: da.data.reshape((4, 4, 2, 4)),
...: dims=['x', 'y', 'band', 'time'],
...: coords={
...: 'time': pd.to_datetime(['2020-01-01', '2021-01-01', '2022-01-01', '2023-01-01']),
...: 'band': [1, 2],
...: },
...: )
请注意,如果您的数据是分块的(假设您希望保持这种方式),您的选择会更加有限 - 请参阅 reshaping dask arrays. The first (MultiIndexing unstack) approach does work with dask arrays as long as the arrays are not chunked along the unstacked dimension. See