如何在 df.apply() 中传递 *args

how to pass *args in df.apply()

我有一个函数,我希望它能够根据输入应用于可变数量的列。

def split_and_combine(row, *args, delimiter=';'):
    combined = []
    for a in args:
        if not row[a]:
            combined.extend(row[a].split(delimiter))

    combined = list(set(combined))
    return combined

但由于 *args,我不确定如何将此函数应用于 df。我对python中的*args*kwargs不是很熟悉。我尝试使用 partial 并设置 axis=1 如下,但得到下面的 TypeError。

df['combined'] = df.apply(partial(split_and_combine, ['col1','col2']),
                          axis=1)

TypeError: ('list indices must be integers or slices, not Series', 'occurred at index 0')

上述代码的虚拟示例。我希望能够传入灵活数量的列以进行组合:

Index   col1        col2            combined
0      John;Mary    Sam;Bill;Eva    John;Mary;Sam;Bill;Eva
1      a;b;c        a;d;f           a;b;c;d;f

谢谢!如果在没有 df.apply 的情况下这样做会更好。请随时发表评论!

df.apply 文档

args : tuple

Positional arguments to pass to func in addition to the array/series.

**kwds

Additional keyword arguments to pass as keywords arguments to func.

df.apply(split_and_combine, args=('col1', 'col2'), axis=1)

顺便说一句,您的函数中可能存在一些错误:

def split_and_combine(row, *args, delimiter=';'):
    combined = []
    for a in args:
        if row[a]:
            combined.extend(row[a].split(delimiter))
    combined = list(set(combined))
    return delimiter.join(combined)