从 Python 中的多个 xarray.Datasets 中屏蔽掉 NaN

Question

如何从具有相同形状的多个 xarray 数据集中屏蔽掉 NaN，以便我可以保留没有 NaN 的通用形状？

import numpy as np
import pandas as pd
import xarray as xr

arrays = [np.array(["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"]),
          np.array(["one", "two", "one", "two", "one", "two", "one", "two"])]
df1 = pd.DataFrame(np.random.randn(8, 4), index=arrays)
df1.iloc[[2, 3, 2], :] = np.nan
ds1 = df1.to_xarray()

arrays = [np.array(["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"]),
          np.array(["one", "two", "one", "two", "one", "two", "one", "two"])]
df2 = pd.DataFrame(np.random.randn(8, 4), index=arrays)
df2.iloc[[1, 4, 1], :] = np.nan
ds2 = df2.to_xarray()

arrays = [np.array(["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"]),
          np.array(["one", "two", "one", "two", "one", "two", "one", "two"])]
df3 = pd.DataFrame(np.random.randn(8, 4), index=arrays)
df3.iloc[[2, 1, 1], :] = np.nan
ds3 = df3.to_xarray()

在上面的示例数据集中，我为每个数据集的不同行和列设置了 NaN。我想屏蔽任何数据集具有 NaN 的行。那么预期的结果将是没有从顶部开始的第二~第五行的数据框，看起来像：

df3.iloc[[0, 5, 6, 7], :]

虽然为了方便和可视化，我用 pd.dataframe 进行了描述，但我想在 xarray.Dataset 结构中进行此操作。我的试用版使用 xr.dataset.where() 之类的 ...

ds1_masked = ds1.where(ds1 != np.nan and ds2 != np.nan and ds3 != np.nan, 
drop=True)

这不起作用（创建了一个没有任何变量的数据集）。

Answer 1

这是我的解决方案：

mask = 1-(np.isnan(ds1.0.values) | np.isnan(ds2.0.values) | np.isnan(ds3.0.values))
ds1_mask_nan = ds1.where(mask, np.nan)    
ds1_mask_out = ds1_mask_nan.where(1-np.isnan(ds1_mask_nan[0]), drop=True)

从 Python 中的多个 xarray.Datasets 中屏蔽掉 NaN

Masking out NaNs from multiple xarray.Datasets in Python

python-xarray