交换两列时混淆 Pandas 中的 reference/assignment 行为
Confusing reference/assignment behaviour in Pandas when swapping two columns
我最近不得不交换 Pandas DataFrame 中的两列,x
和 y
。
通常,我会做类似下面的事情(在 numpy 中):
x = ['A' for i in range(2)]
y = ['B' for i in range(2)]
print([x, y])
# [['A', 'A'], ['B', 'B']]
tmp = x
x = y
y = tmp
print([x, y])
# [['B', 'B'], ['A', 'A']]
对 DataFrame 的列执行相同的操作并不完全有效
df = pd.DataFrame()
df['x'] = ['A' for i in range(2)]
df['y'] = ['B' for i in range(2)]
print(df)
# x y
#0 A B
#1 A B
tmp = df['x']
df['x'] = df['y']
df['y'] = tmp
print(df)
# x y
#0 B B
#1 B B
print(tmp)
#0 B
#1 B
#Name: x, dtype: object
这是怎么回事?
我想这与按引用传递与按值传递有关,
但我找不到更具体的内容。
供参考,交换列的正确方法是
# Correct way
df = df.rename({'x':'y', 'y':'x'}, axis=1)
套餐:
import pandas as pd
import numpy as np
print(pd.__version__)
# 0.25.0
使用-
df = pd.DataFrame()
df['x'] = ['A' for i in range(2)]
df['y'] = ['B' for i in range(2)]
print(df)
# x y
#0 A B
#1 A B
tmp = df['x'].copy()
df['x'] = df['y']
df['y'] = tmp
print(df)
输出
x y
0 A B
1 A B
x y
0 B A
1 B A
但正如您所指出的,rename
才是正确的选择。
When we assign a DataFrame to a new variable using =, we are not
creating a new copy of the DataFrame. We are merely adding a new name
to call the same object
我最近不得不交换 Pandas DataFrame 中的两列,x
和 y
。
通常,我会做类似下面的事情(在 numpy 中):
x = ['A' for i in range(2)]
y = ['B' for i in range(2)]
print([x, y])
# [['A', 'A'], ['B', 'B']]
tmp = x
x = y
y = tmp
print([x, y])
# [['B', 'B'], ['A', 'A']]
对 DataFrame 的列执行相同的操作并不完全有效
df = pd.DataFrame()
df['x'] = ['A' for i in range(2)]
df['y'] = ['B' for i in range(2)]
print(df)
# x y
#0 A B
#1 A B
tmp = df['x']
df['x'] = df['y']
df['y'] = tmp
print(df)
# x y
#0 B B
#1 B B
print(tmp)
#0 B
#1 B
#Name: x, dtype: object
这是怎么回事? 我想这与按引用传递与按值传递有关, 但我找不到更具体的内容。
供参考,交换列的正确方法是
# Correct way
df = df.rename({'x':'y', 'y':'x'}, axis=1)
套餐:
import pandas as pd
import numpy as np
print(pd.__version__)
# 0.25.0
使用-
df = pd.DataFrame()
df['x'] = ['A' for i in range(2)]
df['y'] = ['B' for i in range(2)]
print(df)
# x y
#0 A B
#1 A B
tmp = df['x'].copy()
df['x'] = df['y']
df['y'] = tmp
print(df)
输出
x y
0 A B
1 A B
x y
0 B A
1 B A
但正如您所指出的,rename
才是正确的选择。
When we assign a DataFrame to a new variable using =, we are not creating a new copy of the DataFrame. We are merely adding a new name to call the same object