python xarray select by lat/long 并将点数据提取到数据帧
python xarray select by lat/long and extract point data to dataframe
我想 select lat/long 范围内的所有网格单元格,并且对于每个网格单元格,将其导出为日期框,然后导出到 csv 文件(即 df.to_csv
).我的数据集如下。我可以使用 xr.where(...)
来屏蔽我输入之外的网格单元,但不确定如何遍历未屏蔽的剩余网格。或者,我尝试使用 xr.sel
函数,但它们似乎不接受像 ds.sel(gridlat_0>45)
这样的运算符。 xr.sel_points(...)
也可能有效,但我无法找出在我的案例中使用的索引器的正确语法。提前感谢您的帮助。
<xarray.Dataset>
Dimensions: (time: 48, xgrid_0: 685, ygrid_0: 485)
Coordinates:
gridlat_0 (ygrid_0, xgrid_0) float32 44.6896 44.6956 44.7015 44.7075 ...
* ygrid_0 (ygrid_0) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ...
* xgrid_0 (xgrid_0) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ...
* time (time) datetime64[ns] 2016-07-28T01:00:00 2016-07-28T02:00:00 ...
gridlon_0 (ygrid_0, xgrid_0) float32 -129.906 -129.879 -129.851 ...
Data variables:
u (time, ygrid_0, xgrid_0) float64 nan nan nan nan nan nan nan ...
gridrot_0 (time, ygrid_0, xgrid_0) float32 nan nan nan nan nan nan nan ...
Qli (time, ygrid_0, xgrid_0) float64 nan nan nan nan nan nan nan ...
Qsi (time, ygrid_0, xgrid_0) float64 nan nan nan nan nan nan nan ...
p (time, ygrid_0, xgrid_0) float64 nan nan nan nan nan nan nan ...
rh (time, ygrid_0, xgrid_0) float64 nan nan nan nan nan nan nan ...
press (time, ygrid_0, xgrid_0) float64 nan nan nan nan nan nan nan ...
t (time, ygrid_0, xgrid_0) float64 nan nan nan nan nan nan nan ...
vw_dir (time, ygrid_0, xgrid_0) float64 nan nan nan nan nan nan nan ...
最简单的方法可能是遍历每个网格点,如下所示:
# (optionally) create a grid dataset so we don't need to pull out all
# the data from the main dataset before looking at each point
grid = ds[['gridlat_0', 'gridlon_0']]
for i in range(ds.coords['xgrid_0'].size):
for j in range(ds.coords['ygrid_0'].size):
sub_grid = grid.isel(xgrid_0=i, ygrid_0=j)
if is_valid(sub_grid.gridlat_0, sub_grid.gridlon_0):
sub_ds = ds.isel(xgrid_0=i, ygrid_0=j)
sub_ds.to_dataframe().to_csv(...)
即使是 685x485,循环遍历每个点也应该只需要几秒钟。
预先使用 ds = ds.where(..., drop=True)
(在下一个 xarray 版本中可用,本周晚些时候发布)进行预过滤可以显着加快速度,但您仍然会遇到可能无法执行的问题在正交轴上表示所选网格。
最后一个选项,可能是最干净的选项,是使用 stack
将数据集转换为 2D。然后,您可以在新的 'space'
维度上使用标准选择和分组操作:
ds_stacked = ds.stack(space=['xgrid_0', 'ygrid_0'])
ds_filtered = ds_stacked.sel(space=(ds_stacked.gridlat_0 > 45))
for _, ds_one_place in ds_filtered.groupby('space'):
ds_one_place.to_dataframe().to_csv(...)
我想 select lat/long 范围内的所有网格单元格,并且对于每个网格单元格,将其导出为日期框,然后导出到 csv 文件(即 df.to_csv
).我的数据集如下。我可以使用 xr.where(...)
来屏蔽我输入之外的网格单元,但不确定如何遍历未屏蔽的剩余网格。或者,我尝试使用 xr.sel
函数,但它们似乎不接受像 ds.sel(gridlat_0>45)
这样的运算符。 xr.sel_points(...)
也可能有效,但我无法找出在我的案例中使用的索引器的正确语法。提前感谢您的帮助。
<xarray.Dataset>
Dimensions: (time: 48, xgrid_0: 685, ygrid_0: 485)
Coordinates:
gridlat_0 (ygrid_0, xgrid_0) float32 44.6896 44.6956 44.7015 44.7075 ...
* ygrid_0 (ygrid_0) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ...
* xgrid_0 (xgrid_0) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ...
* time (time) datetime64[ns] 2016-07-28T01:00:00 2016-07-28T02:00:00 ...
gridlon_0 (ygrid_0, xgrid_0) float32 -129.906 -129.879 -129.851 ...
Data variables:
u (time, ygrid_0, xgrid_0) float64 nan nan nan nan nan nan nan ...
gridrot_0 (time, ygrid_0, xgrid_0) float32 nan nan nan nan nan nan nan ...
Qli (time, ygrid_0, xgrid_0) float64 nan nan nan nan nan nan nan ...
Qsi (time, ygrid_0, xgrid_0) float64 nan nan nan nan nan nan nan ...
p (time, ygrid_0, xgrid_0) float64 nan nan nan nan nan nan nan ...
rh (time, ygrid_0, xgrid_0) float64 nan nan nan nan nan nan nan ...
press (time, ygrid_0, xgrid_0) float64 nan nan nan nan nan nan nan ...
t (time, ygrid_0, xgrid_0) float64 nan nan nan nan nan nan nan ...
vw_dir (time, ygrid_0, xgrid_0) float64 nan nan nan nan nan nan nan ...
最简单的方法可能是遍历每个网格点,如下所示:
# (optionally) create a grid dataset so we don't need to pull out all
# the data from the main dataset before looking at each point
grid = ds[['gridlat_0', 'gridlon_0']]
for i in range(ds.coords['xgrid_0'].size):
for j in range(ds.coords['ygrid_0'].size):
sub_grid = grid.isel(xgrid_0=i, ygrid_0=j)
if is_valid(sub_grid.gridlat_0, sub_grid.gridlon_0):
sub_ds = ds.isel(xgrid_0=i, ygrid_0=j)
sub_ds.to_dataframe().to_csv(...)
即使是 685x485,循环遍历每个点也应该只需要几秒钟。
预先使用 ds = ds.where(..., drop=True)
(在下一个 xarray 版本中可用,本周晚些时候发布)进行预过滤可以显着加快速度,但您仍然会遇到可能无法执行的问题在正交轴上表示所选网格。
最后一个选项,可能是最干净的选项,是使用 stack
将数据集转换为 2D。然后,您可以在新的 'space'
维度上使用标准选择和分组操作:
ds_stacked = ds.stack(space=['xgrid_0', 'ygrid_0'])
ds_filtered = ds_stacked.sel(space=(ds_stacked.gridlat_0 > 45))
for _, ds_one_place in ds_filtered.groupby('space'):
ds_one_place.to_dataframe().to_csv(...)