如何将组添加到groupby

Question

我假设的数据框是

df = pd.DataFrame({'col1':[91,91,91,91,92,92,92,92],
                  'col2':[91,92]*4,'value':[10]*8})
df

   col1 col2 value
0   91  91  10
1   91  92  10
2   91  91  10
3   91  92  10
4   92  91  10
5   92  92  10
6   92  91  10
7   92  92  10

对两列进行分组会生成这些组：

grouped = df.groupby(['col1','col2'])
grouped.groups
{(91, 91): Int64Index([0, 2], dtype='int64'),
 (91, 92): Int64Index([1, 3], dtype='int64'),
 (92, 91): Int64Index([4, 6], dtype='int64'),
 (92, 92): Int64Index([5, 7], dtype='int64')}

我想扩展这组组，以便我可以聚合扩展选择的组。
假设我想添加由

生成的组

groupedall = df.groupby(['col1'])
groupedall.groups
{91: Int64Index([0, 1, 2, 3], dtype='int64'),
 92: Int64Index([4, 5, 6, 7], dtype='int64')}

这是我的尝试：我用 99 代替 col2 值（其中 99 表示“任何”），

groupedall.groups[(91, 99)] = groupedall.groups.pop(91)
groupedall.groups[(92, 99)] = groupedall.groups.pop(92)

然后将这些新组添加到我原来的组字典中。

grouped.groups.update(groupedall.groups)
grouped.groups
{(91, 91): Int64Index([0, 2], dtype='int64'),
 (91, 92): Int64Index([1, 3], dtype='int64'),
 (91, 99): Int64Index([0, 1, 2, 3], dtype='int64'),
 (92, 91): Int64Index([4, 6], dtype='int64'),
 (92, 92): Int64Index([5, 7], dtype='int64'),
 (92, 99): Int64Index([4, 5, 6, 7], dtype='int64')}

但是当我尝试对分组的对象进行分组时，那些新添加的组被忽略了。

grouped.sum()
               value
col1    col2    
91      91      20
        92      20
92      91      20
        92      20

我希望输出包括我刚刚添加的组：

               value
col1    col2    
91      91      20
        92      20
        99      40
92      91      20
        92      20
        99      40

我在这里错过了什么？

Answer 1

这里的关键似乎是您要手动将组添加到 GroupByDataFrame。

当您查看 grouped.groups 时，这似乎有效，但当您查看 grouped 的任何其他属性时，很明显新组未被视为一个组。

似乎无法通过这种方式更改 groupbydataframe，但是使用@QuickBeam2k1 提供的link，您可以通过以下方式获取所需的数据：

df.pivot_table(
    index='col1',
    columns='col2',
    values='value',
    aggfunc='sum',
    margins=True
)

哪个returns:

col2    91      92      All
col1            
91      20.0    20.0    40.0
92      20.0    20.0    40.0
All     40.0    40.0    80.0

Answer 2

选项 1

df.append(df.assign(col2=99)).groupby(['col1', 'col2']).sum()

           value
col1 col2       
91   91       20
     92       20
     99       40
92   91       20
     92       20
     99       40

选项 2

dummy_series = pd.Series(99, df.index, name='col2')

pd.concat([
    df.groupby(['col1', 'col2']).sum(),
    df.groupby(['col1', dummy_series])[['value']].sum()
]).sort_index()

           value
col1 col2       
91   91       20
     92       20
     99       40
92   91       20
     92       20
     99       40

如何将组添加到groupby

How to add groups to groupby

python

group-by

pandas

pandas-groupby