分组后查找包含所有 nan 的列 pandas

Question

在数据框中df如何在对行分组后找到包含所有 nan 的列？

In [97]: df
Out[97]:
     a    b  group
0  NaN  NaN  a
1  0.0  NaN  a
2  2.0  NaN  a
3  1.0  7.0  b
4  1.0  3.0  b
5  7.0  4.0  b
6  2.0  6.0  c
7  9.0  6.0  c
8  3.0  0.0  c
9  9.0  0.0  c

在这种情况下，所需的输出应该是组：a - 列：b

Answer 1

使用set_index by grouping column first, then find all NaNs by isnull.

然后 groupby and aggregate all. Last reshape by stack 并使用所有组和列名称创建新的 DataFrame：

print (df.set_index('group').isnull().groupby('group').all())
           a      b
group              
a      False   True
b      False  False
c      False  False

a = df.set_index('group').isnull().groupby('group').all().stack()

b = pd.DataFrame(a[a].index.values.tolist(), columns=['group','cols'])
print (b)
  group cols
0     a    b

Answer 2

试试这个？

df.groupby('group').sum().unstack()[df.groupby('group').sum().unstack().isnull()].reset_index()

  level_0 group   0
0       b     a NaN

Answer 3

你在找这个吗？即获取组名和值列作为完整的 Nan 值

vals = [(i['group'].iloc[0],i.columns[i.isnull().all()].tolist()) for _,i in df.groupby('group')]

输出：

[('a', ['b']), ('b', []), ('c', [])]

分组后查找包含所有 nan 的列 pandas

find columns containing all nan after grouping pandas

python

pandas

pandas-groupby