如何通过折叠坐标重塑 xarray 数据集

How to reshape xarray dataset by collapsing coordinate

我目前有一个数据集,当用 xarray 打开时包含三个坐标 x, y, band。波段坐标在 4 个不同的时间间隔内分别具有温度和露点,这意味着总共有 8 个波段。有没有办法重塑它,这样我就可以 x, y, band, time 这样带坐标现在只有长度 2 而时间坐标长度为 4?

我想我可以添加一个名为 time 的新坐标,然后在 but

中添加波段
ds = ds.assign_coords(time=[1,2,3,4])

returns ValueError: cannot add coordinates with new dimensions to a DataArray.

您可以 re-assign 将“带”坐标 MultiIndex:

In [4]: da = xr.DataArray(np.random.random((4, 4, 8)), dims=['x', 'y', 'band'])

In [5]: da.coords['band'] = pd.MultiIndex.from_arrays(
   ...:     [
   ...:         [1, 1, 1, 1, 2, 2, 2, 2],
   ...:         pd.to_datetime(['2020-01-01', '2021-01-01', '2022-01-01', '2023-01-01'] * 2),
   ...:     ],
   ...:     names=['band_stacked', 'time'],
   ...: )

In [6]: stacked
Out[6]:
<xarray.DataArray (x: 4, y: 4, band: 8)>
array([[[2.55228052e-01, 6.71680777e-01, 8.76158643e-01, 5.23808010e-01,
         8.56941412e-01, 2.75757101e-01, 7.88877551e-02, 1.54739786e-02],
        [3.70350510e-01, 1.90604842e-02, 2.17871931e-01, 9.40704074e-01,
         4.28769745e-02, 9.24407375e-01, 2.81715762e-01, 9.12889594e-01],
        [7.36529770e-02, 1.53507827e-01, 2.83341417e-01, 3.00687140e-01,
         7.41822972e-01, 6.82413237e-01, 7.92126231e-01, 4.84821281e-01],
        [5.24897891e-01, 4.69537663e-01, 2.47668326e-01, 7.56147251e-02,
         6.27767921e-01, 2.70630355e-01, 5.44669493e-01, 3.53063860e-01]],
...
       [[1.56513994e-02, 8.49568142e-01, 3.67268562e-01, 7.28406400e-01,
         2.82383223e-01, 5.00901504e-01, 9.99643260e-01, 1.16446139e-01],
        [9.98980637e-01, 2.45060112e-02, 8.12423749e-01, 4.49895624e-01,
         6.64880037e-01, 8.73506549e-01, 1.79186788e-01, 1.94347924e-01],
        [6.32000394e-01, 7.60414128e-01, 4.90153658e-01, 3.40693056e-01,
         5.19820559e-01, 4.49398587e-01, 1.90339730e-01, 6.38101614e-02],
        [7.64102189e-01, 6.79961676e-01, 7.63165470e-01, 6.23766131e-02,
         5.62677420e-01, 3.85784911e-01, 4.43436365e-01, 2.44385584e-01]]])
Coordinates:
  * band          (band) MultiIndex
  - band_stacked  (band) int64 1 1 1 1 2 2 2 2
  - time          (band) datetime64[ns] 2020-01-01 2021-01-01 ... 2023-01-01
Dimensions without coordinates: x, y

然后可以通过拆栈来扩维:

In [7]: unstacked
Out[7]:
<xarray.DataArray (x: 4, y: 4, band: 2, time: 4)>
array([[[[2.55228052e-01, 6.71680777e-01, 8.76158643e-01,
          5.23808010e-01],
         [8.56941412e-01, 2.75757101e-01, 7.88877551e-02,
          1.54739786e-02]],
...
        [[7.64102189e-01, 6.79961676e-01, 7.63165470e-01,
          6.23766131e-02],
         [5.62677420e-01, 3.85784911e-01, 4.43436365e-01,
          2.44385584e-01]]]])
Coordinates:
  * band     (band) int64 1 2
  * time     (time) datetime64[ns] 2020-01-01 2021-01-01 2022-01-01 2023-01-01
Dimensions without coordinates: x, y

另一个更手动的选项是在 numpy 中重塑并创建一个新的 DataArray。请注意,对于较大的数组,此手动重塑 很多:

In [8]: reshaped = xr.DataArray(
   ...:     da.data.reshape((4, 4, 2, 4)),
   ...:     dims=['x', 'y', 'band', 'time'],
   ...:     coords={
   ...:         'time': pd.to_datetime(['2020-01-01', '2021-01-01', '2022-01-01', '2023-01-01']),
   ...:         'band': [1, 2],
   ...:     },
   ...: )

请注意,如果您的数据是分块的(假设您希望保持这种方式),您的选择会更加有限 - 请参阅 reshaping dask arrays. The first (MultiIndexing unstack) approach does work with dask arrays as long as the arrays are not chunked along the unstacked dimension. See 上的 dask 文档作为示例。