如何将多个函数应用于单个 pandas 数据框列?

How to apply several functions to a single pandas dataframe column?

我很好奇是否可以将多个函数应用于单个 pandas 数据框列。例如,假设我有三个函数:

在:

def foo(col):
    if 'hi' in col:
        return 'TRUE'

def bar(col):
    if 'bye' in col:
        return 'TRUE'

def baz(col):
    if 'ok' in col:
        return 'TRUE'

以及以下数据框:

dfs = pd.DataFrame({'col':['The quick hi brown fox hi jumps over the lazy dog', 
                           'The quick hi brown fox bye jumps over the lazy dog', 
                           'The NO quick brown fox ok jumps bye over the lazy dog']})

如果我想将每个函数应用到 col,通常我会使用 pandas apply 函数:

dfs['new_col1'] = dfs['col'].apply(foo)

dfs['new_col2'] = dfs['col'].apply(bar)

dfs['new_col3'] = dfs['col'].apply(baz)

dfs

输出:

    col     new_col1    new_col2    new_col3
0   The quick hi brown fox hi jumps over the lazy dog   TRUE    None    None
1   The quick hi brown fox bye jumps over the lazy...   TRUE    TRUE    None
2   The NO quick brown fox ok jumps bye over the l...   None    TRUE    TRUE

但是,如您所见,我创建了 3 列。因此,我的问题是如何在大型数据帧中有效地将上述 3 个函数同时应用于特定列?,预期结果应该是:

    col                                                 new_col
0   The quick hi brown fox hi jumps over the lazy dog   TRUE
1   The quick hi brown fox bye jumps over the lazy...   TRUE, TRUE
2   The NO quick brown fox ok jumps bye over the l...   TRUE, TRUE

请注意,我知道我可以将 3 列合并为一列。不过,我想知道上述问题是否可行

为什么不将所有函数合并为一个巨大的函数?

def oneGaintFunc(col):    
    def foo(col):
        if 'hi' in col:
            return 'TRUE'

    def bar(col):
        if 'bye' in col:
            return 'TRUE'

    def baz(col):
        if 'ok' in col:
            return 'TRUE'

    a = foo(col)
    b = bar(col)
    c = baz(col)
    return '{} {} {}'.format(a, b, c)

df['new_col'] = df['col'].apply(oneGiantFunc)

您可以将 applylist comprehension 一起使用,其中过滤器 None 值:

dfs['new_col'] = dfs['col'].apply(lambda x: (', '.join([x for x in 
                                            [foo(x), bar(x), baz(x)] if x != None])))
print (dfs)
                                                 col     new_col
0  The quick hi brown fox hi jumps over the lazy dog        TRUE
1  The quick hi brown fox bye jumps over the lazy...  TRUE, TRUE
2  The NO quick brown fox ok jumps bye over the l...  TRUE, TRUE

我认为你实际上做不到'at the same time'。 但是这里有 2 个选项

1. 假设函数定义如下:

dfs['new_col1'] = (dfs['col'].apply(foo)&dfs['col'].apply(bar))&dfs['col'].apply(baz)

2. 重新定义函数

def foo(aao): # all at once
    if ('hi' in col) and ('bye' in col) and ('ok' in col):
        return 'TRUE'

dfs['new_col'] = dfs['col'].apply(aao)

使用 lambda 函数,例如

lambda x: ', '.join([f(x) for f in [foo, bar, baz] if f(x)])

在电话中申请。完整示例:

In : dfs['new_col'] = dfs['col'].apply(lambda x: ', '.join([f(x) for f in [foo, bar, baz] if f(x)]))

In : dfs
Out: 
                                                 col     new_col
0  The quick hi brown fox hi jumps over the lazy dog        TRUE
1  The quick hi brown fox bye jumps over the lazy...  TRUE, TRUE
2  The NO quick brown fox ok jumps bye over the l...  TRUE, TRUE