python xarray select by lat/long 并将点数据提取到数据帧

python xarray select by lat/long and extract point data to dataframe

我想 select lat/long 范围内的所有网格单元格,并且对于每个网格单元格,将其导出为日期框,然后导出到 csv 文件(即 df.to_csv ).我的数据集如下。我可以使用 xr.where(...) 来屏蔽我输入之外的网格单元,但不确定如何遍历未屏蔽的剩余网格。或者,我尝试使用 xr.sel 函数,但它们似乎不接受像 ds.sel(gridlat_0>45) 这样的运算符。 xr.sel_points(...) 也可能有效,但我无法找出在我的案例中使用的索引器的正确语法。提前感谢您的帮助。

<xarray.Dataset>
Dimensions:    (time: 48, xgrid_0: 685, ygrid_0: 485)
Coordinates:
    gridlat_0  (ygrid_0, xgrid_0) float32 44.6896 44.6956 44.7015 44.7075 ...
  * ygrid_0    (ygrid_0) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ...
  * xgrid_0    (xgrid_0) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ...
  * time       (time) datetime64[ns] 2016-07-28T01:00:00 2016-07-28T02:00:00 ...
    gridlon_0  (ygrid_0, xgrid_0) float32 -129.906 -129.879 -129.851 ...
Data variables:
    u          (time, ygrid_0, xgrid_0) float64 nan nan nan nan nan nan nan ...
    gridrot_0  (time, ygrid_0, xgrid_0) float32 nan nan nan nan nan nan nan ...
    Qli        (time, ygrid_0, xgrid_0) float64 nan nan nan nan nan nan nan ...
    Qsi        (time, ygrid_0, xgrid_0) float64 nan nan nan nan nan nan nan ...
    p          (time, ygrid_0, xgrid_0) float64 nan nan nan nan nan nan nan ...
    rh         (time, ygrid_0, xgrid_0) float64 nan nan nan nan nan nan nan ...
    press      (time, ygrid_0, xgrid_0) float64 nan nan nan nan nan nan nan ...
    t          (time, ygrid_0, xgrid_0) float64 nan nan nan nan nan nan nan ...
    vw_dir     (time, ygrid_0, xgrid_0) float64 nan nan nan nan nan nan nan ...

最简单的方法可能是遍历每个网格点,如下所示:

# (optionally) create a grid dataset so we don't need to pull out all
# the data from the main dataset before looking at each point
grid = ds[['gridlat_0', 'gridlon_0']]

for i in range(ds.coords['xgrid_0'].size):
    for j in range(ds.coords['ygrid_0'].size):
        sub_grid = grid.isel(xgrid_0=i, ygrid_0=j)
        if is_valid(sub_grid.gridlat_0, sub_grid.gridlon_0):
            sub_ds = ds.isel(xgrid_0=i, ygrid_0=j)
            sub_ds.to_dataframe().to_csv(...)

即使是 685x485,循环遍历每个点也应该只需要几秒钟。

预先使用 ds = ds.where(..., drop=True)(在下一个 xarray 版本中可用,本周晚些时候发布)进行预过滤可以显着加快速度,但您仍然会遇到可能无法执行的问题在正交轴上表示所选网格。

最后一个选项,可能是最干净的选项,是使用 stack 将数据集转换为 2D。然后,您可以在新的 'space' 维度上使用标准选择和分组操作:

ds_stacked = ds.stack(space=['xgrid_0', 'ygrid_0'])
ds_filtered = ds_stacked.sel(space=(ds_stacked.gridlat_0 > 45))
for _, ds_one_place in ds_filtered.groupby('space'):
    ds_one_place.to_dataframe().to_csv(...)