在 by python pandas 中使用分组方法的变量
Using a variable for the group by method in by python pandas
我在一个函数中有一个group by,我想传入聚合方法。在我将其切换为变量之前,语法一直有效。这是我的数据框:
import pandas as pd
import numpy as np
np.random.seed(0)
df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
'office_id': list(range(1, 7)) * 2,
'sales': [np.random.randint(100000, 999999)
for _ in range(12)],
'units': [np.random.randint(100, 999)
for _ in range(12)]})
这是我的功能不起作用:
def create_all_summary(df,features,column_to_aggregate,agg_method):
df_output = df.groupby(features)[column_to_aggregate].agg_method()
return df_output
test = create_all_summary(df,['state'],['sales','units'],'sum')
错误显示“*** AttributeError:'DataFrameGroupBy' 对象没有属性 'agg_method'”
这是我想要做的(硬编码):
test= df.groupby(['state', 'office_id'])['sales','units'].sum()
我的功能得到了预期的结果:
您可以这样调整它:
In [1087]: def create_all_summary(df,features,column_to_aggregate,agg_method):
...: df_output = df.groupby(features)[column_to_aggregate].agg(agg_method)
...: return df_output
...:
In [1089]: test = create_all_summary(df,['state'],['sales','units'],'sum')
In [1090]: test
Out[1090]:
sales units
state
AZ 1959019 1651
CA 1170343 1029
CO 1502538 1367
WA 800080 1872
在可以使用计数、求和等的地方使用聚合方法
代码:
import pandas as pd
import numpy as np
def create_all_summary(df,features,column_to_aggregate,agg_method):
df_output = df.groupby(features)[column_to_aggregate].agg(agg_method)
return df_output
np.random.seed(0)
df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
'office_id': list(range(1, 7)) * 2,
'sales': [np.random.randint(100000, 999999)
for _ in range(12)],
'units': [np.random.randint(100, 999)
for _ in range(12)]})
test= df.groupby(['state', 'office_id'])['sales','units'].sum()
print(test)
test1 = create_all_summary(df,['state'],['sales','units'],'sum')
print(test1)
输出:
sales units
state office_id
AZ 2 222579 651
4 252315 496
6 835831 949
CA 1 405711 170
3 710581 187
5 982371 414
CO 1 404137 586
3 217952 700
5 474564 700
WA 2 535829 572
4 548242 274
6 459783 805
sales units
state
AZ 1310725 2096
CA 2098663 771
CO 1096653 1986
WA 1543854 1651
我在一个函数中有一个group by,我想传入聚合方法。在我将其切换为变量之前,语法一直有效。这是我的数据框:
import pandas as pd
import numpy as np
np.random.seed(0)
df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
'office_id': list(range(1, 7)) * 2,
'sales': [np.random.randint(100000, 999999)
for _ in range(12)],
'units': [np.random.randint(100, 999)
for _ in range(12)]})
这是我的功能不起作用:
def create_all_summary(df,features,column_to_aggregate,agg_method):
df_output = df.groupby(features)[column_to_aggregate].agg_method()
return df_output
test = create_all_summary(df,['state'],['sales','units'],'sum')
错误显示“*** AttributeError:'DataFrameGroupBy' 对象没有属性 'agg_method'” 这是我想要做的(硬编码):
test= df.groupby(['state', 'office_id'])['sales','units'].sum()
我的功能得到了预期的结果:
您可以这样调整它:
In [1087]: def create_all_summary(df,features,column_to_aggregate,agg_method):
...: df_output = df.groupby(features)[column_to_aggregate].agg(agg_method)
...: return df_output
...:
In [1089]: test = create_all_summary(df,['state'],['sales','units'],'sum')
In [1090]: test
Out[1090]:
sales units
state
AZ 1959019 1651
CA 1170343 1029
CO 1502538 1367
WA 800080 1872
在可以使用计数、求和等的地方使用聚合方法
代码:
import pandas as pd
import numpy as np
def create_all_summary(df,features,column_to_aggregate,agg_method):
df_output = df.groupby(features)[column_to_aggregate].agg(agg_method)
return df_output
np.random.seed(0)
df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
'office_id': list(range(1, 7)) * 2,
'sales': [np.random.randint(100000, 999999)
for _ in range(12)],
'units': [np.random.randint(100, 999)
for _ in range(12)]})
test= df.groupby(['state', 'office_id'])['sales','units'].sum()
print(test)
test1 = create_all_summary(df,['state'],['sales','units'],'sum')
print(test1)
输出:
sales units
state office_id
AZ 2 222579 651
4 252315 496
6 835831 949
CA 1 405711 170
3 710581 187
5 982371 414
CO 1 404137 586
3 217952 700
5 474564 700
WA 2 535829 572
4 548242 274
6 459783 805
sales units
state
AZ 1310725 2096
CA 2098663 771
CO 1096653 1986
WA 1543854 1651