Pandas 如何传递 DataFrame.assign 参数来添加多个新列?

Pandas how to pass DataFrame.assign arguments to add multiple new columns?

如何 assign 用于 return 添加了多个新列的原始 DataFrame 的副本?

想要的结果:

df = pd.DataFrame({'A': range(1, 5), 'B': range(11, 15)})
>>> df.assign({'C': df.A.apply(lambda x: x ** 2), 'D': df.B * 2})
   A   B   C   D
0  1  11   1  22
1  2  12   4  24
2  3  13   9  26
3  4  14  16  28

上面的例子导致:

ValueError: Wrong number of items passed 2, placement implies 1.

背景:

Pandas 中的 assign 函数获取相关数据帧的副本并加入新分配的列,例如

df = df.assign(C=df.B * 2)
>>> df
   A   B   C
0  1  11  22
1  2  12  24
2  3  13  26
3  4  14  28

此函数的 0.19.2 documentation 意味着可以将不止一列添加到数据框中。

Assigning multiple columns within the same assign is possible, but you cannot reference other columns created within the same assign call.

另外:

Parameters:
kwargs : keyword, value pairs

keywords are the column names.

该函数的源代码声明它接受字典:

def assign(self, **kwargs):
    """
    .. versionadded:: 0.16.0
    Parameters
    ----------
    kwargs : keyword, value pairs
        keywords are the column names. If the values are callable, they are computed 
        on the DataFrame and assigned to the new columns. If the values are not callable, 
        (e.g. a Series, scalar, or array), they are simply assigned.

    Notes
    -----
    Since ``kwargs`` is a dictionary, the order of your
    arguments may not be preserved. The make things predicatable,
    the columns are inserted in alphabetical order, at the end of
    your DataFrame. Assigning multiple columns within the same
    ``assign`` is possible, but you cannot reference other columns
    created within the same ``assign`` call.
    """

    data = self.copy()

    # do all calculations first...
    results = {}
    for k, v in kwargs.items():

        if callable(v):
            results[k] = v(data)
        else:
            results[k] = v

    # ... and then assign
    for k, v in sorted(results.items()):
        data[k] = v

    return data

您可以通过将每个新列作为关键字参数来创建多列:

df = df.assign(C=df['A']**2, D=df.B*2)

我使用 **:

将字典解包为关键字参数,从而使您的示例字典能够正常工作
df = df.assign(**{'C': df.A.apply(lambda x: x ** 2), 'D': df.B * 2})

看来assign应该可以接受字典,但根据您发布的源代码,目前似乎不支持它。

结果输出:

   A   B   C   D
0  1  11   1  22
1  2  12   4  24
2  3  13   9  26
3  4  14  16  28