使用 xarray 和 numba 包读取数据和计算气候学
use xarray and numba packages to read data and calculate climatology
为了加快xarray包的计算速度,我尝试在函数中加入numba guvectorize,但是有几个问题:
- 如果我编写两个函数:
read_pr
和 day_clim
,day_clim
的输入不再是 xarray,因为 guvectorize 设置为 float64[:], float64[:]
。因此,groupby 函数不起作用。我也试过 xr.core.dataarray.DataArray[:], xr.core.dataarray.DataArray[:]
,但出现错误 NameError: name 'xr' is not defined
。
- 我也想将@guvectorize 应用于
read_pr
。但是,guvectorize 需要首先声明类型和形状,并且每个维度上的形状应该保持不变。
例如,
(m),(n),(n) -> (m,n) # ok
(n),() -> (m,n) # error
read_pr
中的输入是字符串和浮点数(形状:()),而输出是 xarray(类型:,形状:(l,m ,n) )
代码:
from numba import float64, guvectorize
import numba
import numpy as np
import xarray as xr
path = '/data3/USERS/waynetsai/pyaos_wks_samples/data/'
fname = 'cmorph_sample.nc'
lats = -20
latn = 30
lon1 = 89
lon2 = 171
time1 = '2000-01-01'
time2 = '2020-12-31'
def read_pr(path, fname, time1, time2, lats, latn, lon1, lon2):
with xr.open_dataset(path + fname) as pr_ds:
pr = (pr_ds.sel(time=slice(time1,time2),
lat=slice(lats,latn),
lon=slice(lon1,lon2)).cmorph)
return pr
pr = xr.apply_ufunc(read_pr, path, fname, time1, time2, lats, latn, lon1, lon2)
@guvectorize(
"(float64[:], float64[:])",
"(l,m,n) -> (l,m,n)"
)
def day_clim(pr):
prGB = pr.groupby("time.day")
prDayClim = prGB.mean("time")
return prDayClim
prDayClim = xr.apply_ufunc(day_clim, pr)
欢迎所有建议!
Numba does not support the functions of the xarray module. Thus, you cannot use Numba to speed up the function read_pr
and day_clim
. If you want to use Numba for such function, you need to get somehow Numpy arrays from xarrays, and even if you could, there is no groupby
function in Numpy yet 所以这意味着你需要重写这个函数,即使你这样做了,我预计 Numba 在这种情况下不会更快(除非你自己编写一个非常优化的实现)。
为了加快xarray包的计算速度,我尝试在函数中加入numba guvectorize,但是有几个问题:
- 如果我编写两个函数:
read_pr
和day_clim
,day_clim
的输入不再是 xarray,因为 guvectorize 设置为float64[:], float64[:]
。因此,groupby 函数不起作用。我也试过xr.core.dataarray.DataArray[:], xr.core.dataarray.DataArray[:]
,但出现错误NameError: name 'xr' is not defined
。 - 我也想将@guvectorize 应用于
read_pr
。但是,guvectorize 需要首先声明类型和形状,并且每个维度上的形状应该保持不变。 例如,
(m),(n),(n) -> (m,n) # ok
(n),() -> (m,n) # error
read_pr
中的输入是字符串和浮点数(形状:()),而输出是 xarray(类型:
代码:
from numba import float64, guvectorize
import numba
import numpy as np
import xarray as xr
path = '/data3/USERS/waynetsai/pyaos_wks_samples/data/'
fname = 'cmorph_sample.nc'
lats = -20
latn = 30
lon1 = 89
lon2 = 171
time1 = '2000-01-01'
time2 = '2020-12-31'
def read_pr(path, fname, time1, time2, lats, latn, lon1, lon2):
with xr.open_dataset(path + fname) as pr_ds:
pr = (pr_ds.sel(time=slice(time1,time2),
lat=slice(lats,latn),
lon=slice(lon1,lon2)).cmorph)
return pr
pr = xr.apply_ufunc(read_pr, path, fname, time1, time2, lats, latn, lon1, lon2)
@guvectorize(
"(float64[:], float64[:])",
"(l,m,n) -> (l,m,n)"
)
def day_clim(pr):
prGB = pr.groupby("time.day")
prDayClim = prGB.mean("time")
return prDayClim
prDayClim = xr.apply_ufunc(day_clim, pr)
欢迎所有建议!
Numba does not support the functions of the xarray module. Thus, you cannot use Numba to speed up the function read_pr
and day_clim
. If you want to use Numba for such function, you need to get somehow Numpy arrays from xarrays, and even if you could, there is no groupby
function in Numpy yet 所以这意味着你需要重写这个函数,即使你这样做了,我预计 Numba 在这种情况下不会更快(除非你自己编写一个非常优化的实现)。