如何在 pandas 中使用 notnull() 过滤多级列？

Question

我使用这个生成了一个具有一些 NAN 值的多索引数据帧：

arrays = [["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"],["one", "two", "one", "two", "one", "two", "one", "two"],]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples)
a = np.random.randn(3, 8)
mask = np.random.choice([1, 0], a.shape, p=[.3, .7]).astype(bool)
a[mask] = np.nan
df = pd.DataFrame(a, columns=index)
df

这将创建如下内容：

我想要获取没有空值的顶级列（例如 df[['baz','qux']]）的特定子集的行。例如，在 df[['baz','qux']] 中，我想获取第 0 行和第 1 行，因为它们在 3.

中都为空值

希望事情能像我试过的普通 df 一样工作：

cols = ['bar','baz']
df[cols].loc[df[cols].notnull()]

但我显然遗漏了一些东西：

ValueError: Cannot index with multidimensional key

multiindex/advanced indexing 的 pandas 文档说明了如何索引和切片这种数据框，但似乎没有任何关于 .loc/lookups/filtering 的内容。所以我假设我找错地方了。但是我找不到这方面的结果或资源。

Answer 1

df[cols].notna() 不是一维布尔掩码。您必须在轴上使用 all 或 any 来减小维度。

>>> df[df[cols].notna().all(1)]

        bar                 baz                 foo                 qux
        one       two       one       two       one       two       one       two
0  1.799680 -0.901705 -1.575930  0.185863 -0.793007  1.485423       NaN       NaN
2  1.379878 -0.748599  0.661697 -1.015311 -0.858144       NaN -1.623013  0.340043

如何在 pandas 中使用 notnull() 过滤多级列？

How do I filter multi-level columns using notnull() in pandas?

python

multi-index

pandas