合并具有相同范围但不同空间分辨率的 xarray 数据集
Merging xarray datasets with same extent but different spatial resolution
我有一个基于卫星的太阳诱导荧光 (SIF) 数据集和一个模拟降水数据集。我想在我的研究区域以每个像素为基础将降水量与 SIF 进行比较。我的两个数据集属于同一区域,但空间分辨率略有不同。当我对整个区域取平均值时,我可以成功地绘制这些值并相互比较,但我正在努力创建每个像素的散点图。
老实说,在寻找降水对 SIF 的影响时,我不确定这是否是比较这两个值的最佳方法,因此我愿意接受不同方法的想法。至于合并当前我正在使用的数据 xr.combine_by_coords
但它给了我一个错误,我在下面描述过。我也可以通过将 netcdfs 转换为 geotiffs 然后使用 rasterio 来扭曲它们来做到这一点,但这似乎是进行这种比较的低效方法。这是我到目前为止所拥有的:
import netCDF4
import numpy as np
import dask
import xarray as xr
rainy_bbox = np.array([
[-69.29519955115512,-13.861261028444734],
[-69.29519955115512,-12.384786628185896],
[-71.19583431678012,-12.384786628185896],
[-71.19583431678012,-13.861261028444734]])
max_lon_lat = np.max(rainy_bbox, axis=0)
min_lon_lat = np.min(rainy_bbox, axis=0)
# this dataset is available here: ftp://fluo.gps.caltech.edu/data/tropomi/gridded/
sif = xr.open_dataset('../data/TROPO_SIF_03-2018.nc')
# the dataset is global so subset to my study area in the Amazon
rainy_sif_xds = sif.sel(lon=slice(min_lon_lat[0], max_lon_lat[0]), lat=slice(min_lon_lat[1], max_lon_lat[1]))
# this data can all be downloaded from NASA Goddard here either manually or with wget but you'll need an account on https://disc.gsfc.nasa.gov/: https://pastebin.com/viZckVdn
imerg_xds = xr.open_mfdataset('../data/3B-DAY.MS.MRG.3IMERG.201803*.nc4')
# spatial subset
rainy_imerg_xds = imerg_xds.sel(lon=slice(min_lon_lat[0], max_lon_lat[0]), lat=slice(min_lon_lat[1], max_lon_lat[1]))
# I'm not sure the best way to combine these datasets but am trying this
combo_xds = xr.combine_by_coords([rainy_imerg_xds, rainy_xds])
目前,我在最后一行收到了看似无用的 RecursionError: maximum recursion depth exceeded in comparison
。当我添加参数 join='left'
时,rainy_imerg_xds
数据集的数据在 combo_xds
中,当我添加 join='right'
时, rainy_xds
数据存在,如果我do join='inner'
没有数据存在。我假设这个函数有一些内部插值,但它似乎没有。
这个 documentation from xarray 非常简单地概述了这个问题的解决方案。 xarray
允许您在多个维度中进行插值,并指定另一个 Dataset 的 x 和 y 维度作为输出维度。所以在这种情况下它是用
完成的
# interpolation based on http://xarray.pydata.org/en/stable/interpolation.html
# interpolation can't be done across the chunked dimension so we have to load it all into memory
rainy_sif_xds.load()
#interpolate into the higher resolution grid from IMERG
interp_rainy_sif_xds = rainy_sif_xds.interp(lat=rainy_imerg_xds["lat"], lon=rainy_imerg_xds["lon"])
# visualize the output
rainy_sif_xds.dcSIF.mean(dim='time').hvplot.quadmesh('lon', 'lat', cmap='jet', geo=True, rasterize=True, dynamic=False, width=450).relabel('Initial') +\
interp_rainy_sif_xds.dcSIF.mean(dim='time').hvplot.quadmesh('lon', 'lat', cmap='jet', geo=True, rasterize=True, dynamic=False, width=450).relabel('Interpolated')
# now that our coordinates match, in order to actually merge we need to convert the default CFTimeIndex to datetime to merge dataset with SIF data because the IMERG rainfall dataset was CFTime and the SIF was datetime
rainy_imerg_xds['time'] = rainy_imerg_xds.indexes['time'].to_datetimeindex()
# now the merge can easily be done with
merged_xds = xr.combine_by_coords([rainy_imerg_xds, interp_rainy_sif_xds], coords=['lat', 'lon', 'time'], join="inner")
# now visualize the two datasets together // multiply SIF by 30 because values are so ow
merged_xds.HQprecipitation.rolling(time=7, center=True).sum().mean(dim=('lat', 'lon')).hvplot().relabel('Precip') * \
(merged_xds.dcSIF.mean(dim=('lat', 'lon'))*30).hvplot().relabel('SIF')
我有一个基于卫星的太阳诱导荧光 (SIF) 数据集和一个模拟降水数据集。我想在我的研究区域以每个像素为基础将降水量与 SIF 进行比较。我的两个数据集属于同一区域,但空间分辨率略有不同。当我对整个区域取平均值时,我可以成功地绘制这些值并相互比较,但我正在努力创建每个像素的散点图。
老实说,在寻找降水对 SIF 的影响时,我不确定这是否是比较这两个值的最佳方法,因此我愿意接受不同方法的想法。至于合并当前我正在使用的数据 xr.combine_by_coords
但它给了我一个错误,我在下面描述过。我也可以通过将 netcdfs 转换为 geotiffs 然后使用 rasterio 来扭曲它们来做到这一点,但这似乎是进行这种比较的低效方法。这是我到目前为止所拥有的:
import netCDF4
import numpy as np
import dask
import xarray as xr
rainy_bbox = np.array([
[-69.29519955115512,-13.861261028444734],
[-69.29519955115512,-12.384786628185896],
[-71.19583431678012,-12.384786628185896],
[-71.19583431678012,-13.861261028444734]])
max_lon_lat = np.max(rainy_bbox, axis=0)
min_lon_lat = np.min(rainy_bbox, axis=0)
# this dataset is available here: ftp://fluo.gps.caltech.edu/data/tropomi/gridded/
sif = xr.open_dataset('../data/TROPO_SIF_03-2018.nc')
# the dataset is global so subset to my study area in the Amazon
rainy_sif_xds = sif.sel(lon=slice(min_lon_lat[0], max_lon_lat[0]), lat=slice(min_lon_lat[1], max_lon_lat[1]))
# this data can all be downloaded from NASA Goddard here either manually or with wget but you'll need an account on https://disc.gsfc.nasa.gov/: https://pastebin.com/viZckVdn
imerg_xds = xr.open_mfdataset('../data/3B-DAY.MS.MRG.3IMERG.201803*.nc4')
# spatial subset
rainy_imerg_xds = imerg_xds.sel(lon=slice(min_lon_lat[0], max_lon_lat[0]), lat=slice(min_lon_lat[1], max_lon_lat[1]))
# I'm not sure the best way to combine these datasets but am trying this
combo_xds = xr.combine_by_coords([rainy_imerg_xds, rainy_xds])
目前,我在最后一行收到了看似无用的 RecursionError: maximum recursion depth exceeded in comparison
。当我添加参数 join='left'
时,rainy_imerg_xds
数据集的数据在 combo_xds
中,当我添加 join='right'
时, rainy_xds
数据存在,如果我do join='inner'
没有数据存在。我假设这个函数有一些内部插值,但它似乎没有。
这个 documentation from xarray 非常简单地概述了这个问题的解决方案。 xarray
允许您在多个维度中进行插值,并指定另一个 Dataset 的 x 和 y 维度作为输出维度。所以在这种情况下它是用
# interpolation based on http://xarray.pydata.org/en/stable/interpolation.html
# interpolation can't be done across the chunked dimension so we have to load it all into memory
rainy_sif_xds.load()
#interpolate into the higher resolution grid from IMERG
interp_rainy_sif_xds = rainy_sif_xds.interp(lat=rainy_imerg_xds["lat"], lon=rainy_imerg_xds["lon"])
# visualize the output
rainy_sif_xds.dcSIF.mean(dim='time').hvplot.quadmesh('lon', 'lat', cmap='jet', geo=True, rasterize=True, dynamic=False, width=450).relabel('Initial') +\
interp_rainy_sif_xds.dcSIF.mean(dim='time').hvplot.quadmesh('lon', 'lat', cmap='jet', geo=True, rasterize=True, dynamic=False, width=450).relabel('Interpolated')
# now that our coordinates match, in order to actually merge we need to convert the default CFTimeIndex to datetime to merge dataset with SIF data because the IMERG rainfall dataset was CFTime and the SIF was datetime
rainy_imerg_xds['time'] = rainy_imerg_xds.indexes['time'].to_datetimeindex()
# now the merge can easily be done with
merged_xds = xr.combine_by_coords([rainy_imerg_xds, interp_rainy_sif_xds], coords=['lat', 'lon', 'time'], join="inner")
# now visualize the two datasets together // multiply SIF by 30 because values are so ow
merged_xds.HQprecipitation.rolling(time=7, center=True).sum().mean(dim=('lat', 'lon')).hvplot().relabel('Precip') * \
(merged_xds.dcSIF.mean(dim=('lat', 'lon'))*30).hvplot().relabel('SIF')