沿 2 个维度组合 2 个 Xarray DataArray(以便从粗网格获得更精细的网格)
Combining 2 Xarray DataArrays along 2 dimensions (in order to obtain finer grid from coarse grid)
我有 2 个 DataArray,我必须将它们组合起来,但由于某些原因,这不起作用。 (目的是创建一个具有更精细分辨率 (x2) 的网格)。
第一个数组 da_1
包含源数据:坐标对 z
的值 (x,y)
:
da_1 :
<xarray.DataArray (x: 3, y: 2)>
array([[1, 2],
[3, 4],
[5, 6]])
Coordinates:
* x (x) int64 0 1 2
* y (y) int64 8 9
然后,我创建了第二个数组 da_2
:它具有与 da_1
相同的属性,但网格具有更精细的分辨率 (x2),因此在 da_1
中x
坐标是 [0, 1, 2]
,在 da_2
中是 [0, 0.5, 1, 1.5, 2]
。对于 y
坐标:[8, 9]
变为 [8, 8.5, 9]
。 z
值都是 NaN。
da_2 :
<xarray.DataArray (x: 5, y: 3)>
array([[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan]])
Coordinates:
* x (x) float64 0.0 0.5 1.0 1.5 2.0
* y (y) float64 8.0 8.5 9.0
最后,我必须替换 da_2
中 da_1
中存在的 NaN(具有相同的 (x,y)
坐标对):(0,8), (0,9), (1,8), (1,9), (2,8) and (2,9)
.
预期结果:
<xarray.DataArray (x: 5, y: 3)>
array([[ 1., nan, 2.],
[nan, nan, nan],
[ 3., nan, 4.],
[nan, nan, nan],
[ 5., nan, 6.]])
Coordinates:
* x (x) float64 0.0 0.5 1.0 1.5 2.0
* y (y) float64 8.0 8.5 9.0
为此,我尝试将它们与 xarray.combine_by_coords()
结合使用,但失败了。
调用 combine_by_coords([da_1, da_2])
returns da_2
而不是预期的合并 DataArray,并且 combine_by_coords([da_2, da_1])
returns da_1
.
我尝试了所有连接方法,但没有成功。
你知道如何得到预期的结果吗(da_2
,值为da_1
)?
可重现示例(这些数据的直观表示见下文):
import xarray as xr
# Create first DataArray
da_1 = xr.DataArray([[1, 2], [3, 4], [5, 6]], dims=("x", "y"),
coords={"x": [0,1,2], "y": [8,9]})
print(da_1)
print("*"*50)
# Create second DataArray
nan = float("NaN")
da_2_data = [[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan]]
da_2 = xr.DataArray(da_2_data, dims=("x", "y"),
coords={"x": [0, 0.5, 1, 1.5, 2], "y": [8, 8.5, 9]})
print(da_2)
print("*"*50)
# Trying to combine
combined = xr.combine_by_coords([da_1, da_2])
print(combined)
print("*"*50)
expected_data = [[1, nan, 2],
[nan, nan, nan],
[3, nan, 4],
[nan, nan, nan],
[5, nan, 6]]
# Expected output (grid with resolution x2)
expected = xr.DataArray(expected_data, dims=("x", "y"),
coords={"x": [0, 0.5, 1, 1.5, 2], "y": [8, 8.5, 9]})
print(expected)
print("*"*50)
# If all is OK, we should get the same results as in da_1 for identical coordinates
x0_y8 = expected.sel(x=0, y=8).values
x0_y9 = expected.sel(x=0, y=9).values
x1_y8 = expected.sel(x=1, y=8).values
x1_y9 = expected.sel(x=1, y=9).values
x2_y8 = expected.sel(x=2, y=8).values
x2_y9 = expected.sel(x=2, y=9).values
assert(x0_y8 == 1)
assert(x0_y9 == 2)
assert(x1_y8 == 3)
assert(x1_y9 == 4)
assert(x2_y8 == 5)
assert(x2_y9 == 6)
一种解决方案是利用 xarray
和 pandas
之间的连接。您可以检查以下代码。如果你的数据非常大,比如气候科学中有数十亿行的数据框,那么唯一的问题就是速度。对于其他普通数据集,下面的方法应该没问题。
# import packages
import xarray as xr
import pandas as pd
import numpy as np
# construct your sample data
da_1 = xr.DataArray([[1, 2], [3, 4], [5, 6]], dims=("x", "y"),
coords={"x": [0,1,2], "y": [8,9]})
nan = float("NaN")
da_2_data = [[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan]]
da_2 = xr.DataArray(da_2_data, dims=("x", "y"),
coords={"x": [0, 0.5, 1, 1.5, 2], "y": [8, 8.5, 9]})
# build a function to convert xarray to pandas dataframe
def xr_to_df(input_xr):
df = input_xr.to_dataframe()
df = df.reset_index(drop=False)
return df
# assign names to variables in "da_1" and "da_2"
# so you can combine them later
da_1 = da_1.rename("da_1")
da_2 = da_2.rename("da_2")
# conver both to pandas dataframes and combine the results
da_1_df = xr_to_df(da_1)
da_2_df = xr_to_df(da_2)
# now you can see that values from "da_1" and "da_2" are already matched on coordinates
da_df_combined = pd.merge(da_1_df,da_2_df,how='right')
print(da_df_combined)
# from now, conver the above dataframe back to xarray
# first get unique X and Y
# these should be natrually sorted from min to max
x = np.unique(da_df_combined['x'])
y = np.unique(da_df_combined['y'])
print("x:",x)
print("y:",y)
# then reshape the data to match the way it is structured
da_1_reshape =da_df_combined['da_1'].values.reshape(len(x),len(y))
# generate xarray and provide a name for the variable
# since you are only interested in values from "da_1", here we do "da_1" only
da_1_xr = xr.DataArray(da_1_reshape, coords=[('x', x),('y', y)])
da_1_xr = da_1_xr.rename("da_1")
# check your results
print(da_1_xr)
# use your way to doublecheck the values
x0_y8 = da_1_xr.sel(x=0, y=8).values
x0_y9 = da_1_xr.sel(x=0, y=9).values
x1_y8 = da_1_xr.sel(x=1, y=8).values
x1_y9 = da_1_xr.sel(x=1, y=9).values
x2_y8 = da_1_xr.sel(x=2, y=8).values
x2_y9 = da_1_xr.sel(x=2, y=9).values
assert(x0_y8 == 1)
assert(x0_y9 == 2)
assert(x1_y8 == 3)
assert(x1_y9 == 4)
assert(x2_y8 == 5)
assert(x2_y9 == 6)
我有 2 个 DataArray,我必须将它们组合起来,但由于某些原因,这不起作用。 (目的是创建一个具有更精细分辨率 (x2) 的网格)。
第一个数组 da_1
包含源数据:坐标对 z
的值 (x,y)
:
da_1 :
<xarray.DataArray (x: 3, y: 2)>
array([[1, 2],
[3, 4],
[5, 6]])
Coordinates:
* x (x) int64 0 1 2
* y (y) int64 8 9
然后,我创建了第二个数组 da_2
:它具有与 da_1
相同的属性,但网格具有更精细的分辨率 (x2),因此在 da_1
中x
坐标是 [0, 1, 2]
,在 da_2
中是 [0, 0.5, 1, 1.5, 2]
。对于 y
坐标:[8, 9]
变为 [8, 8.5, 9]
。 z
值都是 NaN。
da_2 :
<xarray.DataArray (x: 5, y: 3)>
array([[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan]])
Coordinates:
* x (x) float64 0.0 0.5 1.0 1.5 2.0
* y (y) float64 8.0 8.5 9.0
最后,我必须替换 da_2
中 da_1
中存在的 NaN(具有相同的 (x,y)
坐标对):(0,8), (0,9), (1,8), (1,9), (2,8) and (2,9)
.
预期结果:
<xarray.DataArray (x: 5, y: 3)>
array([[ 1., nan, 2.],
[nan, nan, nan],
[ 3., nan, 4.],
[nan, nan, nan],
[ 5., nan, 6.]])
Coordinates:
* x (x) float64 0.0 0.5 1.0 1.5 2.0
* y (y) float64 8.0 8.5 9.0
为此,我尝试将它们与 xarray.combine_by_coords()
结合使用,但失败了。
调用 combine_by_coords([da_1, da_2])
returns da_2
而不是预期的合并 DataArray,并且 combine_by_coords([da_2, da_1])
returns da_1
.
我尝试了所有连接方法,但没有成功。
你知道如何得到预期的结果吗(da_2
,值为da_1
)?
可重现示例(这些数据的直观表示见下文):
import xarray as xr
# Create first DataArray
da_1 = xr.DataArray([[1, 2], [3, 4], [5, 6]], dims=("x", "y"),
coords={"x": [0,1,2], "y": [8,9]})
print(da_1)
print("*"*50)
# Create second DataArray
nan = float("NaN")
da_2_data = [[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan]]
da_2 = xr.DataArray(da_2_data, dims=("x", "y"),
coords={"x": [0, 0.5, 1, 1.5, 2], "y": [8, 8.5, 9]})
print(da_2)
print("*"*50)
# Trying to combine
combined = xr.combine_by_coords([da_1, da_2])
print(combined)
print("*"*50)
expected_data = [[1, nan, 2],
[nan, nan, nan],
[3, nan, 4],
[nan, nan, nan],
[5, nan, 6]]
# Expected output (grid with resolution x2)
expected = xr.DataArray(expected_data, dims=("x", "y"),
coords={"x": [0, 0.5, 1, 1.5, 2], "y": [8, 8.5, 9]})
print(expected)
print("*"*50)
# If all is OK, we should get the same results as in da_1 for identical coordinates
x0_y8 = expected.sel(x=0, y=8).values
x0_y9 = expected.sel(x=0, y=9).values
x1_y8 = expected.sel(x=1, y=8).values
x1_y9 = expected.sel(x=1, y=9).values
x2_y8 = expected.sel(x=2, y=8).values
x2_y9 = expected.sel(x=2, y=9).values
assert(x0_y8 == 1)
assert(x0_y9 == 2)
assert(x1_y8 == 3)
assert(x1_y9 == 4)
assert(x2_y8 == 5)
assert(x2_y9 == 6)
一种解决方案是利用 xarray
和 pandas
之间的连接。您可以检查以下代码。如果你的数据非常大,比如气候科学中有数十亿行的数据框,那么唯一的问题就是速度。对于其他普通数据集,下面的方法应该没问题。
# import packages
import xarray as xr
import pandas as pd
import numpy as np
# construct your sample data
da_1 = xr.DataArray([[1, 2], [3, 4], [5, 6]], dims=("x", "y"),
coords={"x": [0,1,2], "y": [8,9]})
nan = float("NaN")
da_2_data = [[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan]]
da_2 = xr.DataArray(da_2_data, dims=("x", "y"),
coords={"x": [0, 0.5, 1, 1.5, 2], "y": [8, 8.5, 9]})
# build a function to convert xarray to pandas dataframe
def xr_to_df(input_xr):
df = input_xr.to_dataframe()
df = df.reset_index(drop=False)
return df
# assign names to variables in "da_1" and "da_2"
# so you can combine them later
da_1 = da_1.rename("da_1")
da_2 = da_2.rename("da_2")
# conver both to pandas dataframes and combine the results
da_1_df = xr_to_df(da_1)
da_2_df = xr_to_df(da_2)
# now you can see that values from "da_1" and "da_2" are already matched on coordinates
da_df_combined = pd.merge(da_1_df,da_2_df,how='right')
print(da_df_combined)
# from now, conver the above dataframe back to xarray
# first get unique X and Y
# these should be natrually sorted from min to max
x = np.unique(da_df_combined['x'])
y = np.unique(da_df_combined['y'])
print("x:",x)
print("y:",y)
# then reshape the data to match the way it is structured
da_1_reshape =da_df_combined['da_1'].values.reshape(len(x),len(y))
# generate xarray and provide a name for the variable
# since you are only interested in values from "da_1", here we do "da_1" only
da_1_xr = xr.DataArray(da_1_reshape, coords=[('x', x),('y', y)])
da_1_xr = da_1_xr.rename("da_1")
# check your results
print(da_1_xr)
# use your way to doublecheck the values
x0_y8 = da_1_xr.sel(x=0, y=8).values
x0_y9 = da_1_xr.sel(x=0, y=9).values
x1_y8 = da_1_xr.sel(x=1, y=8).values
x1_y9 = da_1_xr.sel(x=1, y=9).values
x2_y8 = da_1_xr.sel(x=2, y=8).values
x2_y9 = da_1_xr.sel(x=2, y=9).values
assert(x0_y8 == 1)
assert(x0_y9 == 2)
assert(x1_y8 == 3)
assert(x1_y9 == 4)
assert(x2_y8 == 5)
assert(x2_y9 == 6)