交换两列时混淆 Pandas 中的 reference/assignment 行为

Confusing reference/assignment behaviour in Pandas when swapping two columns

我最近不得不交换 Pandas DataFrame 中的两列,xy。 通常,我会做类似下面的事情(在 numpy 中):

x = ['A' for i in range(2)]
y = ['B' for i in range(2)]
print([x, y])
# [['A', 'A'], ['B', 'B']]

tmp = x
x = y
y = tmp
print([x, y])
# [['B', 'B'], ['A', 'A']]

对 DataFrame 的列执行相同的操作并不完全有效

df = pd.DataFrame()
df['x'] = ['A' for i in range(2)]
df['y'] = ['B' for i in range(2)]
print(df)
#   x  y
#0  A  B
#1  A  B

tmp = df['x']
df['x'] = df['y']
df['y'] = tmp
print(df)
#   x  y
#0  B  B
#1  B  B

print(tmp)
#0    B
#1    B
#Name: x, dtype: object

这是怎么回事? 我想这与按引用传递与按值传递有关, 但我找不到更具体的内容。

供参考,交换列的正确方法是

# Correct way
df = df.rename({'x':'y', 'y':'x'}, axis=1)

套餐:

import pandas as pd
import numpy as np
print(pd.__version__)
# 0.25.0

使用-

df = pd.DataFrame()
df['x'] = ['A' for i in range(2)]
df['y'] = ['B' for i in range(2)]
print(df)
#   x  y
#0  A  B
#1  A  B

tmp = df['x'].copy()
df['x'] = df['y']
df['y'] = tmp
print(df)

输出

   x  y
0  A  B
1  A  B
   x  y
0  B  A
1  B  A

但正如您所指出的,rename 才是正确的选择。

When we assign a DataFrame to a new variable using =, we are not creating a new copy of the DataFrame. We are merely adding a new name to call the same object

Full article here