如何为 Python 中的每个分组应用用户定义的函数
How can I apply a user defined function for each grouping in Python
我有一个数据框 df1 如下:
Country|Month|Revenue
-------|-----|-------
US |Jan |100
US |Feb |200
US |Mar |300
Canada |Jan |200
Canada |Feb |400
Canada |Mar |500
我想按如下方式应用用户定义函数:
df3=df1.groupby(['Country'])['Revenue'].my_cool_func()
def my_cool_func():
b = max(Revenue)-Min(Revenue)
c=b/2
return c
我对 df3 的最终输出应该是:
Country|my_cool_func_rev
-------|----------------
US |100
Canada |150
如何使用用户定义函数获得上述输出?
您可以使用 GroupBy.apply
and in function working with Series
, so is possible use Series.max
and Series.min
:
def my_cool_func(x):
#print (x)
return (x.max() - x.min()) / 2
df3=df1.groupby(['Country'])['Revenue'].apply(my_cool_func).reset_index()
print (df3)
Country Revenue
0 Canada 150.0
1 US 100.0
或者:
df3=df1.groupby(['Country'])['Revenue'].apply(lambda x:(x.max() - x.min()) / 2).reset_index()
print (df3)
Country Revenue
0 Canada 150.0
1 US 100.0
编辑:使用 Series.std
:
def my_cool_func(x):
b = x.std()
c=b/2
return c
df3=df1.groupby(['Country'])['Revenue'].apply(my_cool_func).reset_index()
print (df3)
Country Revenue
0 Canada 76.376262
1 US 50.000000
如果要聚合多个列,您可以尝试的另一件事是 groupby
+ agg
:
def my_cool_func(x):
return (x.max() - x.min()) / 2
你可以直接:
df.groupby("Country")
.agg({
"column1": "sum",
"Revenue": my_cool_func,
"columnOther": ...
})
我有一个数据框 df1 如下:
Country|Month|Revenue
-------|-----|-------
US |Jan |100
US |Feb |200
US |Mar |300
Canada |Jan |200
Canada |Feb |400
Canada |Mar |500
我想按如下方式应用用户定义函数:
df3=df1.groupby(['Country'])['Revenue'].my_cool_func()
def my_cool_func():
b = max(Revenue)-Min(Revenue)
c=b/2
return c
我对 df3 的最终输出应该是:
Country|my_cool_func_rev
-------|----------------
US |100
Canada |150
如何使用用户定义函数获得上述输出?
您可以使用 GroupBy.apply
and in function working with Series
, so is possible use Series.max
and Series.min
:
def my_cool_func(x):
#print (x)
return (x.max() - x.min()) / 2
df3=df1.groupby(['Country'])['Revenue'].apply(my_cool_func).reset_index()
print (df3)
Country Revenue
0 Canada 150.0
1 US 100.0
或者:
df3=df1.groupby(['Country'])['Revenue'].apply(lambda x:(x.max() - x.min()) / 2).reset_index()
print (df3)
Country Revenue
0 Canada 150.0
1 US 100.0
编辑:使用 Series.std
:
def my_cool_func(x):
b = x.std()
c=b/2
return c
df3=df1.groupby(['Country'])['Revenue'].apply(my_cool_func).reset_index()
print (df3)
Country Revenue
0 Canada 76.376262
1 US 50.000000
如果要聚合多个列,您可以尝试的另一件事是 groupby
+ agg
:
def my_cool_func(x):
return (x.max() - x.min()) / 2
你可以直接:
df.groupby("Country")
.agg({
"column1": "sum",
"Revenue": my_cool_func,
"columnOther": ...
})