在自定义函数上使用整个 groupby 对象

Question

我有一个看起来像这样的数据框：

df =

date           col1      col2      col3
---------------------------------------
2022/03/01     1         5         10
2022/03/01     3         6         12
2022/03/01     5         7         14
2022/03/02     6         8         15
2022/03/02     2         9         17
2022/03/02     8         10        19
2022/03/03     2         11        21
2022/03/03     10        12        22
2022/03/03     9         13        23

然后我有一个看起来像这样的函数：

my_func(df):
    <do something with the `df` given to the function>

    return result

所以在我的例子中，结果只是一个 float 通过对使用的数据框做几件事计算得出的结果。

我想做的是 groupby 原始数据框中的日期，然后将这些组对象用作函数中的输入，并返回所有行的计算值，即生成的数据框看起来像：

df_group_object1 =

date           col1      col2      col3     result
--------------------------------------------------
2022/03/01     1         5         10       15
2022/03/01     3         6         12       15
2022/03/01     5         7         14       15


df_group_object2 =

date           col1      col2      col3     result
--------------------------------------------------
2022/03/02     6         8         15       25
2022/03/02     2         9         17       25
2022/03/02     8         10        19       25


df_group_object3 =

date           col1      col2      col3     result
--------------------------------------------------
2022/03/03     2         11        21       56
2022/03/03     10        12        22       56
2022/03/03     9         13        23       56

其中 result 列只是我输入的随机值。实际值将来自 my_func。

我的想法是做这样的事情：

df["result"] = df.groupby(["date"]).transform(my_func)

但我认为将提供给函数的 groupby 对象似乎根本不是整个数据框。

那么有办法做到这一点吗？

Answer 1

假设你想对分组的DataFrame进行操作然后收集结果，你可以只在groupby对象上使用for循环：

import pandas as pd

df = pd.DataFrame({'col1':[1,1,2,2,3], 'col2':[1,2,3,4,5]})
def my_func(df):
    return df['col2'] + 1

# let's say you want to groupby col1 and operate on the rest of the columns
group_object = []
for group_name, df_chunk in df.groupby('col1'):
    df_chunk['result'] = my_func(df_chunk)
    group_object.append(df_chunk)

group_object[0]:

    col1    col2    result
0   1       1       2
1   1       2       3

在自定义函数上使用整个 groupby 对象

Use entire groupby object on custom function

python

pandas

pandas-groupby