使用 xarray 和 numba 包读取数据和计算气候学

use xarray and numba packages to read data and calculate climatology

为了加快xarray包的计算速度,我尝试在函数中加入numba guvectorize,但是有几个问题:

  1. 如果我编写两个函数:read_prday_climday_clim 的输入不再是 xarray,因为 guvectorize 设置为 float64[:], float64[:]。因此,groupby 函数不起作用。我也试过 xr.core.dataarray.DataArray[:], xr.core.dataarray.DataArray[:],但出现错误 NameError: name 'xr' is not defined
  2. 我也想将@guvectorize 应用于read_pr。但是,guvectorize 需要首先声明类型和形状,并且每个维度上的形状应该保持不变。 例如,
    (m),(n),(n) -> (m,n)  # ok
    (n),() -> (m,n)  # error

read_pr 中的输入是字符串和浮点数(形状:()),而输出是 xarray(类型:,形状:(l,m ,n) )

代码:

from numba import float64, guvectorize
import numba
import numpy as np
import xarray as xr

path = '/data3/USERS/waynetsai/pyaos_wks_samples/data/'
fname = 'cmorph_sample.nc'

lats = -20
latn =  30
lon1 =  89
lon2 = 171
time1 = '2000-01-01'
time2 = '2020-12-31'


def read_pr(path, fname, time1, time2, lats, latn, lon1, lon2):
    with xr.open_dataset(path + fname) as pr_ds:
        pr = (pr_ds.sel(time=slice(time1,time2),
                               lat=slice(lats,latn),
                               lon=slice(lon1,lon2)).cmorph)
    return pr

pr = xr.apply_ufunc(read_pr, path, fname, time1, time2, lats, latn, lon1, lon2)

@guvectorize(
    "(float64[:], float64[:])",
    "(l,m,n) -> (l,m,n)"
)
def day_clim(pr):
    prGB = pr.groupby("time.day")
    prDayClim = prGB.mean("time")
    return prDayClim
prDayClim = xr.apply_ufunc(day_clim, pr)

欢迎所有建议!

Numba does not support the functions of the xarray module. Thus, you cannot use Numba to speed up the function read_pr and day_clim. If you want to use Numba for such function, you need to get somehow Numpy arrays from xarrays, and even if you could, there is no groupby function in Numpy yet 所以这意味着你需要重写这个函数,即使你这样做了,我预计 Numba 在这种情况下不会更快(除非你自己编写一个非常优化的实现)。