Pandas 按两列分组，在其他 4 列中交换值

Question

我有一个包含很多列的大型 Pandas DataFrame。我想确保 C 列和 E 列包含相同顺序的值。

例如：如果 first two rows shows (red and green) & third row shows (Green and red) 那么 third row should change it to red and green 如下所示。

输入

输出

附加任务：

进行此更改时，我想交换同一行中其他四列（2 对）的值。

输入

输出

注意：当我们应用 group by 时，它还包括下面突出显示的行，但我不想交换这些值，因为它有一个标准序列，红色在前，绿色在后。

我已经用下面的函数试过了，但是在输入了几百个之后，很难手动跟踪所有的组合。文件很大，有很多行和列。

def swap(x):
    if x[0] < 0:
        return [x[1],x[0]]
    else:
        return [x[0],x[1]]

有什么方法可以在给定条件下交换多个值吗？

编辑 1：在 Rob Raymond 的回答之后

import pandas as pd
import itertools
import random

df = pd.read_excel("Path\test_copy.xlsx") # My original excel sheet which contains all data  

colors1 = []
colors2 = []
colors = []
colors1 = df['C'].values.tolist()
colors2 = df['E'].values.tolist()
colors = colors1 + colors2
colors = list( dict.fromkeys(colors) )
colorp = list(itertools.permutations(colors, 2))

df1 = pd.DataFrame([pd.Series(colorp[random.randint(0,len(colorp)-1)]).rename({0:"C",1:"E"}).to_dict() for i in range(20)])

# find rows where colors in different order to a previous combination
df2 = df.assign(swap=df.apply(lambda r: ((df.loc[(df.C.eq(r.E)&df.E.eq(r.C))].index.values)<r.name).any(), axis=1))

# swap the columns, can be extended to other columns
df2.loc[df2.swap] = df2.loc[df2.swap].rename(columns={"C":"E","E":"C"})
df2.loc[df2.swap] = df2.loc[df2.swap].rename(columns={"M":"N","M":"N"})
df2.loc[df2.swap] = df2.loc[df2.swap].rename(columns={"G":"I","I":"G"})


# lets compare what's happened...
df2.join(df, rsuffix="_start")

df2.to_excel (r"PAth\result_swapped.xlsx", index = None, header=True)

所有六列中的值按预期同时交换，但结果不准确。输出文件仍然包含 opposite sequence 中“C”和“E”列中的一些值。对于那些 wrong sequence 行，交换状态是 “TRUE”。这意味着原始序列是正确的，但我们的脚本已经交换了它。

Answer 1

正在模拟您的数据
模拟条件 - 较早的行与列的顺序相反
交换列是使用 mask 和 rename()

import itertools
colors = ["Red","Green","Blue","Purple","Indigo","Pink"]

colorp = list(itertools.permutations(colors, 2))

df = pd.DataFrame([pd.Series(colorp[random.randint(0,len(colorp)-1)]).rename({0:"C",1:"E"}).to_dict() for i in range(20)])

# find rows where colors in different order to a previous combination
df2 = df.assign(swap=df.apply(lambda r: ((df.loc[(df.C.eq(r.E)&df.E.eq(r.C))].index.values)<r.name).any(), axis=1))

# swap the columns, can be extended to other columns
df2.loc[df2.swap] = df2.loc[df2.swap].rename(columns={"C":"E","E":"C"})

# lets compare what's happened...
df2.join(df, rsuffix="_start")

	C	E	swap	C_start	E_start
0	Green	Indigo	False	Green	Indigo
1	Pink	Red	False	Pink	Red
2	Indigo	Blue	False	Indigo	Blue
3	Green	Blue	False	Green	Blue
4	Green	Indigo	True	Indigo	Green
5	Indigo	Blue	True	Blue	Indigo
6	Pink	Purple	False	Pink	Purple
7	Indigo	Blue	True	Blue	Indigo
8	Green	Pink	False	Green	Pink
9	Red	Blue	False	Red	Blue
10	Red	Indigo	False	Red	Indigo
11	Red	Purple	False	Red	Purple
12	Green	Indigo	True	Indigo	Green
13	Pink	Purple	True	Purple	Pink
14	Green	Indigo	True	Indigo	Green
15	Purple	Indigo	False	Purple	Indigo
16	Indigo	Blue	True	Blue	Indigo
17	Green	Blue	False	Green	Blue
18	Red	Green	False	Red	Green
19	Indigo	Green	True	Green	Indigo

Pandas 按两列分组，在其他 4 列中交换值

Pandas group by two column with swapped values in other 4 columns

python

numpy

itertools

pandas