使用带参数的 Pandas groupby() + apply()

Question

我想将 df.groupby() 与 apply() 结合使用，以将函数应用于每组的每一行。

我通常使用下面的代码，它通常有效（注意，这是没有 groupby()）：

df.apply(myFunction, args=(arg1,))

使用 groupby() 我尝试了以下操作：

df.groupby('columnName').apply(myFunction, args=(arg1,))

但是，我收到以下错误：

TypeError: myFunction() got an unexpected keyword argument 'args'

因此，我的问题是：如何将 groupby() 和 apply() 与需要参数的函数一起使用？

Answer 1

pandas.core.groupby.GroupBy.apply does NOT have named parameter args, but pandas.DataFrame.apply 确实有。

所以试试这个：

df.groupby('columnName').apply(lambda x: myFunction(x, arg1))

或按照的建议：

df.groupby('columnName').apply(myFunction, ('arg1'))

演示：

In [82]: df = pd.DataFrame(np.random.randint(5,size=(5,3)), columns=list('abc'))

In [83]: df
Out[83]:
   a  b  c
0  0  3  1
1  0  3  4
2  3  0  4
3  4  2  3
4  3  4  1

In [84]: def f(ser, n):
    ...:     return ser.max() * n
    ...:

In [85]: df.apply(f, args=(10,))
Out[85]:
a    40
b    40
c    40
dtype: int64

使用 GroupBy.apply 时，您可以传递命名参数：

In [86]: df.groupby('a').apply(f, n=10)
Out[86]:
    a   b   c
a
0   0  30  40
3  30  40  40
4  40  20  30

参数元组：

In [87]: df.groupby('a').apply(f, (10))
Out[87]:
    a   b   c
a
0   0  30  40
3  30  40  40
4  40  20  30

Answer 2

关于为什么使用 args 参数会抛出错误的一些困惑可能源于 pandas.DataFrame.apply does have an args parameter (a tuple), while pandas.core.groupby.GroupBy.apply 不会抛出错误的事实。

因此，当您在 DataFrame 本身上调用 .apply 时，您可以使用此参数；当您在 groupby 对象上调用 .apply 时，您不能。

在@MaxU的回答中，表达式lambda x: myFunction(x, arg1)被传递给了func（第一个参数）；无需指定额外的 *args/**kwargs 因为 arg1 已在 lambda 中指定。

一个例子：

import numpy as np
import pandas as pd

# Called on DataFrame - `args` is a 1-tuple
# `0` / `1` are just the axis arguments to np.sum
df.apply(np.sum, axis=0)  # equiv to df.sum(0)
df.apply(np.sum, axis=1)  # equiv to df.sum(1)


# Called on groupby object of the DataFrame - will throw TypeError
print(df.groupby('col1').apply(np.sum, args=(0,)))
# TypeError: sum() got an unexpected keyword argument 'args'

Answer 3

对我来说

df2 = df.groupby('columnName').apply(lambda x: my_function(x, arg1, arg2,))

工作

使用带参数的 Pandas groupby() + apply()

Use Pandas groupby() + apply() with arguments

python

apply

dataframe

pandas

pandas-groupby