Pandas 按两列分组,在其他 4 列中交换值

Pandas group by two column with swapped values in other 4 columns

我有一个包含很多列的大型 Pandas DataFrame。我想确保 C 列和 E 列包含相同顺序的值。

例如:如果 first two rows shows (red and green) & third row shows (Green and red) 那么 third row should change it to red and green 如下所示。

输入

输出

附加任务:

进行此更改时,我想交换同一行中其他四列(2 对)的值。

输入

输出

注意:当我们应用 group by 时,它还包括下面突出显示的行,但我不想交换这些值,因为它有一个标准序列,红色在前,绿色在后。

我已经用下面的函数试过了,但是在输入了几百个之后,很难手动跟踪所有的组合。文件很大,有很多行和列。

def swap(x):
    if x[0] < 0:
        return [x[1],x[0]]
    else:
        return [x[0],x[1]]

有什么方法可以在给定条件下交换多个值吗?

编辑 1:在 Rob Raymond 的回答之后

import pandas as pd
import itertools
import random

df = pd.read_excel("Path\test_copy.xlsx") # My original excel sheet which contains all data  

colors1 = []
colors2 = []
colors = []
colors1 = df['C'].values.tolist()
colors2 = df['E'].values.tolist()
colors = colors1 + colors2
colors = list( dict.fromkeys(colors) )
colorp = list(itertools.permutations(colors, 2))

df1 = pd.DataFrame([pd.Series(colorp[random.randint(0,len(colorp)-1)]).rename({0:"C",1:"E"}).to_dict() for i in range(20)])

# find rows where colors in different order to a previous combination
df2 = df.assign(swap=df.apply(lambda r: ((df.loc[(df.C.eq(r.E)&df.E.eq(r.C))].index.values)<r.name).any(), axis=1))

# swap the columns, can be extended to other columns
df2.loc[df2.swap] = df2.loc[df2.swap].rename(columns={"C":"E","E":"C"})
df2.loc[df2.swap] = df2.loc[df2.swap].rename(columns={"M":"N","M":"N"})
df2.loc[df2.swap] = df2.loc[df2.swap].rename(columns={"G":"I","I":"G"})


# lets compare what's happened...
df2.join(df, rsuffix="_start")

df2.to_excel (r"PAth\result_swapped.xlsx", index = None, header=True)

所有六列中的值按预期同时交换,但结果不准确。输出文件仍然包含 opposite sequence 中“C”和“E”列中的一些值。对于那些 wrong sequence 行,交换状态是 “TRUE”。这意味着原始序列是正确的,但我们的脚本已经交换了它。

  • 正在模拟您的数据
  • 模拟条件 - 较早的行与列的顺序相反
  • 交换列是使用 maskrename()
  • 完成的
import itertools
colors = ["Red","Green","Blue","Purple","Indigo","Pink"]

colorp = list(itertools.permutations(colors, 2))

df = pd.DataFrame([pd.Series(colorp[random.randint(0,len(colorp)-1)]).rename({0:"C",1:"E"}).to_dict() for i in range(20)])

# find rows where colors in different order to a previous combination
df2 = df.assign(swap=df.apply(lambda r: ((df.loc[(df.C.eq(r.E)&df.E.eq(r.C))].index.values)<r.name).any(), axis=1))

# swap the columns, can be extended to other columns
df2.loc[df2.swap] = df2.loc[df2.swap].rename(columns={"C":"E","E":"C"})

# lets compare what's happened...
df2.join(df, rsuffix="_start")
C E swap C_start E_start
0 Green Indigo False Green Indigo
1 Pink Red False Pink Red
2 Indigo Blue False Indigo Blue
3 Green Blue False Green Blue
4 Green Indigo True Indigo Green
5 Indigo Blue True Blue Indigo
6 Pink Purple False Pink Purple
7 Indigo Blue True Blue Indigo
8 Green Pink False Green Pink
9 Red Blue False Red Blue
10 Red Indigo False Red Indigo
11 Red Purple False Red Purple
12 Green Indigo True Indigo Green
13 Pink Purple True Purple Pink
14 Green Indigo True Indigo Green
15 Purple Indigo False Purple Indigo
16 Indigo Blue True Blue Indigo
17 Green Blue False Green Blue
18 Red Green False Red Green
19 Indigo Green True Green Indigo