在 by python pandas 中使用分组方法的变量

Using a variable for the group by method in by python pandas

我在一个函数中有一个group by,我想传入聚合方法。在我将其切换为变量之前,语法一直有效。这是我的数据框:

import pandas as pd

import numpy as np
np.random.seed(0)
df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
                       'office_id': list(range(1, 7)) * 2,
                       'sales': [np.random.randint(100000, 999999)
                                 for _ in range(12)],
                       'units': [np.random.randint(100, 999)
                                 for _ in range(12)]})

这是我的功能不起作用:

def create_all_summary(df,features,column_to_aggregate,agg_method): 
    df_output = df.groupby(features)[column_to_aggregate].agg_method()
    return df_output

test = create_all_summary(df,['state'],['sales','units'],'sum')

错误显示“*** AttributeError:'DataFrameGroupBy' 对象没有属性 'agg_method'” 这是我想要做的(硬编码):

test= df.groupby(['state', 'office_id'])['sales','units'].sum()

我的功能得到了预期的结果:

您可以这样调整它:

In [1087]: def create_all_summary(df,features,column_to_aggregate,agg_method):
      ...:     df_output = df.groupby(features)[column_to_aggregate].agg(agg_method)
      ...:     return df_output
      ...: 

In [1089]: test = create_all_summary(df,['state'],['sales','units'],'sum')

In [1090]: test
Out[1090]: 
         sales  units
state                
AZ     1959019   1651
CA     1170343   1029
CO     1502538   1367
WA      800080   1872

在可以使用计数、求和等的地方使用聚合方法

代码:

import pandas as pd
import numpy as np

def create_all_summary(df,features,column_to_aggregate,agg_method): 
    df_output = df.groupby(features)[column_to_aggregate].agg(agg_method)
    return df_output

np.random.seed(0)
df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
                       'office_id': list(range(1, 7)) * 2,
                       'sales': [np.random.randint(100000, 999999)
                                 for _ in range(12)],
                       'units': [np.random.randint(100, 999)
                                 for _ in range(12)]})

test= df.groupby(['state', 'office_id'])['sales','units'].sum()
print(test)

test1 = create_all_summary(df,['state'],['sales','units'],'sum')
print(test1)

输出:

                 sales  units
state office_id
AZ    2          222579    651
      4          252315    496
      6          835831    949
CA    1          405711    170
      3          710581    187
      5          982371    414
CO    1          404137    586
      3          217952    700
      5          474564    700
WA    2          535829    572
      4          548242    274
      6          459783    805
         sales  units
state
AZ     1310725   2096
CA     2098663    771
CO     1096653   1986
WA     1543854   1651