从 Python 中的多个 xarray.Datasets 中屏蔽掉 NaN
Masking out NaNs from multiple xarray.Datasets in Python
如何从具有相同形状的多个 xarray 数据集中屏蔽掉 NaN,以便我可以保留没有 NaN 的通用形状?
import numpy as np
import pandas as pd
import xarray as xr
arrays = [np.array(["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"]),
np.array(["one", "two", "one", "two", "one", "two", "one", "two"])]
df1 = pd.DataFrame(np.random.randn(8, 4), index=arrays)
df1.iloc[[2, 3, 2], :] = np.nan
ds1 = df1.to_xarray()
arrays = [np.array(["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"]),
np.array(["one", "two", "one", "two", "one", "two", "one", "two"])]
df2 = pd.DataFrame(np.random.randn(8, 4), index=arrays)
df2.iloc[[1, 4, 1], :] = np.nan
ds2 = df2.to_xarray()
arrays = [np.array(["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"]),
np.array(["one", "two", "one", "two", "one", "two", "one", "two"])]
df3 = pd.DataFrame(np.random.randn(8, 4), index=arrays)
df3.iloc[[2, 1, 1], :] = np.nan
ds3 = df3.to_xarray()
在上面的示例数据集中,我为每个数据集的不同行和列设置了 NaN。我想屏蔽任何数据集具有 NaN 的行。那么预期的结果将是没有从顶部开始的第二~第五行的数据框,看起来像:
df3.iloc[[0, 5, 6, 7], :]
虽然为了方便和可视化,我用 pd.dataframe 进行了描述,但我想在 xarray.Dataset 结构中进行此操作。我的试用版使用 xr.dataset.where() 之类的 ...
ds1_masked = ds1.where(ds1 != np.nan and ds2 != np.nan and ds3 != np.nan,
drop=True)
这不起作用(创建了一个没有任何变量的数据集)。
这是我的解决方案:
mask = 1-(np.isnan(ds1.0.values) | np.isnan(ds2.0.values) | np.isnan(ds3.0.values))
ds1_mask_nan = ds1.where(mask, np.nan)
ds1_mask_out = ds1_mask_nan.where(1-np.isnan(ds1_mask_nan[0]), drop=True)
如何从具有相同形状的多个 xarray 数据集中屏蔽掉 NaN,以便我可以保留没有 NaN 的通用形状?
import numpy as np
import pandas as pd
import xarray as xr
arrays = [np.array(["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"]),
np.array(["one", "two", "one", "two", "one", "two", "one", "two"])]
df1 = pd.DataFrame(np.random.randn(8, 4), index=arrays)
df1.iloc[[2, 3, 2], :] = np.nan
ds1 = df1.to_xarray()
arrays = [np.array(["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"]),
np.array(["one", "two", "one", "two", "one", "two", "one", "two"])]
df2 = pd.DataFrame(np.random.randn(8, 4), index=arrays)
df2.iloc[[1, 4, 1], :] = np.nan
ds2 = df2.to_xarray()
arrays = [np.array(["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"]),
np.array(["one", "two", "one", "two", "one", "two", "one", "two"])]
df3 = pd.DataFrame(np.random.randn(8, 4), index=arrays)
df3.iloc[[2, 1, 1], :] = np.nan
ds3 = df3.to_xarray()
在上面的示例数据集中,我为每个数据集的不同行和列设置了 NaN。我想屏蔽任何数据集具有 NaN 的行。那么预期的结果将是没有从顶部开始的第二~第五行的数据框,看起来像:
df3.iloc[[0, 5, 6, 7], :]
虽然为了方便和可视化,我用 pd.dataframe 进行了描述,但我想在 xarray.Dataset 结构中进行此操作。我的试用版使用 xr.dataset.where() 之类的 ...
ds1_masked = ds1.where(ds1 != np.nan and ds2 != np.nan and ds3 != np.nan,
drop=True)
这不起作用(创建了一个没有任何变量的数据集)。
这是我的解决方案:
mask = 1-(np.isnan(ds1.0.values) | np.isnan(ds2.0.values) | np.isnan(ds3.0.values))
ds1_mask_nan = ds1.where(mask, np.nan)
ds1_mask_out = ds1_mask_nan.where(1-np.isnan(ds1_mask_nan[0]), drop=True)