Pandas Groupby：两个子组都存在的选择

Question

我的 DataFrame 具有以下形式。

id group color
i1 aa    white
i1 aa    white
i1 ab    white
i1 ab    black
...

我按如下方式应用 groupby：

groupdf = df.groupby(['id', 'group'])['color'].value_counts()

groupby 的结果有一个 multiindex。

               value
id group color 

i1  aa   white  2
i1  ab   white  1
i1  ab   black  3
i1  ac   black  5
i1  ad   white  4
i1  ad   black  5

i2  aa   white  1
i2  aa   black  1
i2  bb   black  1
i2  cc   white  2
i2  cc   black  6
i2  ad   black  5

我的目标是

select 个条目，其中最后一个索引颜色的两个类别都存在，然后
select黑色最大的组所以结果看起来像：

    value
id

i1  5    #only groups ab and ad have both colors; ad.black = 5 > ab.black = 3
i2  6    #only groups aa and cc have both colors; cc.black = 6 > aa.black = 1

我已经尝试了 .xs() 和 .index.get_level_values() 但我无法实现我的目标。

编辑 1：我看到我在上面提供了关于如何获取 DataFrame 和更新它的信息不多。我不能直接插入 .max() 因为原始 df 没有值列。

Answer 1

让我们试试：

# mask the groups with more than one colors
s = df.groupby(['id','group'])['value'].transform('size') > 1


# boolean index the groups and query, then another groupby with max
df[s].query('color=="black"').groupby(['id','color'])['value'].max()

输出：

id  color
i1  black    5
i2  black    6
Name: value, dtype: int64

Pandas Groupby：两个子组都存在的选择

Pandas Groupby: selection where both subgroups exist

python

group-by

multi-index

pandas