Pandas 如何传递 DataFrame.assign 参数来添加多个新列?
Pandas how to pass DataFrame.assign arguments to add multiple new columns?
如何 assign
用于 return 添加了多个新列的原始 DataFrame 的副本?
想要的结果:
df = pd.DataFrame({'A': range(1, 5), 'B': range(11, 15)})
>>> df.assign({'C': df.A.apply(lambda x: x ** 2), 'D': df.B * 2})
A B C D
0 1 11 1 22
1 2 12 4 24
2 3 13 9 26
3 4 14 16 28
上面的例子导致:
ValueError: Wrong number of items passed 2, placement implies 1
.
背景:
Pandas 中的 assign
函数获取相关数据帧的副本并加入新分配的列,例如
df = df.assign(C=df.B * 2)
>>> df
A B C
0 1 11 22
1 2 12 24
2 3 13 26
3 4 14 28
此函数的 0.19.2 documentation 意味着可以将不止一列添加到数据框中。
Assigning multiple columns within the same assign is possible, but you cannot reference other columns created within the same assign call.
另外:
Parameters:
kwargs : keyword, value pairs
keywords are the column names.
该函数的源代码声明它接受字典:
def assign(self, **kwargs):
"""
.. versionadded:: 0.16.0
Parameters
----------
kwargs : keyword, value pairs
keywords are the column names. If the values are callable, they are computed
on the DataFrame and assigned to the new columns. If the values are not callable,
(e.g. a Series, scalar, or array), they are simply assigned.
Notes
-----
Since ``kwargs`` is a dictionary, the order of your
arguments may not be preserved. The make things predicatable,
the columns are inserted in alphabetical order, at the end of
your DataFrame. Assigning multiple columns within the same
``assign`` is possible, but you cannot reference other columns
created within the same ``assign`` call.
"""
data = self.copy()
# do all calculations first...
results = {}
for k, v in kwargs.items():
if callable(v):
results[k] = v(data)
else:
results[k] = v
# ... and then assign
for k, v in sorted(results.items()):
data[k] = v
return data
您可以通过将每个新列作为关键字参数来创建多列:
df = df.assign(C=df['A']**2, D=df.B*2)
我使用 **
:
将字典解包为关键字参数,从而使您的示例字典能够正常工作
df = df.assign(**{'C': df.A.apply(lambda x: x ** 2), 'D': df.B * 2})
看来assign
应该可以接受字典,但根据您发布的源代码,目前似乎不支持它。
结果输出:
A B C D
0 1 11 1 22
1 2 12 4 24
2 3 13 9 26
3 4 14 16 28
如何 assign
用于 return 添加了多个新列的原始 DataFrame 的副本?
想要的结果:
df = pd.DataFrame({'A': range(1, 5), 'B': range(11, 15)})
>>> df.assign({'C': df.A.apply(lambda x: x ** 2), 'D': df.B * 2})
A B C D
0 1 11 1 22
1 2 12 4 24
2 3 13 9 26
3 4 14 16 28
上面的例子导致:
ValueError: Wrong number of items passed 2, placement implies 1
.
背景:
Pandas 中的 assign
函数获取相关数据帧的副本并加入新分配的列,例如
df = df.assign(C=df.B * 2)
>>> df
A B C
0 1 11 22
1 2 12 24
2 3 13 26
3 4 14 28
此函数的 0.19.2 documentation 意味着可以将不止一列添加到数据框中。
Assigning multiple columns within the same assign is possible, but you cannot reference other columns created within the same assign call.
另外:
Parameters:
kwargs : keyword, value pairs
keywords are the column names.
该函数的源代码声明它接受字典:
def assign(self, **kwargs):
"""
.. versionadded:: 0.16.0
Parameters
----------
kwargs : keyword, value pairs
keywords are the column names. If the values are callable, they are computed
on the DataFrame and assigned to the new columns. If the values are not callable,
(e.g. a Series, scalar, or array), they are simply assigned.
Notes
-----
Since ``kwargs`` is a dictionary, the order of your
arguments may not be preserved. The make things predicatable,
the columns are inserted in alphabetical order, at the end of
your DataFrame. Assigning multiple columns within the same
``assign`` is possible, but you cannot reference other columns
created within the same ``assign`` call.
"""
data = self.copy()
# do all calculations first...
results = {}
for k, v in kwargs.items():
if callable(v):
results[k] = v(data)
else:
results[k] = v
# ... and then assign
for k, v in sorted(results.items()):
data[k] = v
return data
您可以通过将每个新列作为关键字参数来创建多列:
df = df.assign(C=df['A']**2, D=df.B*2)
我使用 **
:
df = df.assign(**{'C': df.A.apply(lambda x: x ** 2), 'D': df.B * 2})
看来assign
应该可以接受字典,但根据您发布的源代码,目前似乎不支持它。
结果输出:
A B C D
0 1 11 1 22
1 2 12 4 24
2 3 13 9 26
3 4 14 16 28