我怎样才能加快 xarray 重采样（比 pandas 重采样慢得多）

Question

这是一个 MWE，用于对 xarray 与 pandas 中的时间序列进行重采样。 10Min 重采样在 xarray 中需要 6.8 秒，在 pandas 中需要 0.003 秒。有没有办法在 xarray 中获得 Pandas 速度？ Pandas resample 似乎与周期无关，而 xarray 与周期成比例。

import numpy as np
import xarray as xr
import pandas as pd
import time

def make_ds(freq):
    size = 100000
    times = pd.date_range('2000-01-01', periods=size, freq=freq)
    ds = xr.Dataset({
        'foo': xr.DataArray(
            data   = np.random.random(size),
            dims   = ['time'],
            coords = {'time': times}
        )})
    return ds

for f in ["1s", "1Min", "10Min"]:
    ds = make_ds(f)

    start = time.time()
    ds_r = ds.resample({'time':"1H"}).mean()
    print(f, 'xr', str(time.time() - start))

    start = time.time()
    ds_r = ds.to_dataframe().resample("1H").mean()
    print(f, 'pd', str(time.time() - start))

: 1s xr 0.040313720703125
: 1s pd 0.0033435821533203125
: 1Min xr 0.5757267475128174
: 1Min pd 0.0025794506072998047
: 10Min xr 6.798743486404419
: 10Min pd 0.0029947757720947266

Answer 1

根据 xarray GH issue，这是一个实施问题。解决方案是在其他代码中进行重采样（实际上是 GroupBy）。我的解决方案是使用快速 Pandas 重采样，然后重建 xarray 数据集：

df_h = ds.to_dataframe().resample("1H").mean()  # what we want (quickly), but in Pandas form
vals = [xr.DataArray(data=df_h[c], dims=['time'], coords={'time':df_h.index}, attrs=ds[c].attrs) for c in df_h.columns]
ds_h = xr.Dataset(dict(zip(df_h.columns,vals)), attrs=ds.attrs)

我怎样才能加快 xarray 重采样（比 pandas 重采样慢得多）

How can I speed up xarray resample (much slower than pandas resample)

python

pandas

python-xarray