Pandas 按两列分组,在其他 4 列中交换值
Pandas group by two column with swapped values in other 4 columns
我有一个包含很多列的大型 Pandas DataFrame。我想确保 C 列和 E 列包含相同顺序的值。
例如:如果 first two rows shows (red and green) & third row shows (Green and red)
那么 third row should change it to red and green
如下所示。
输入
输出
附加任务:
进行此更改时,我想交换同一行中其他四列(2 对)的值。
输入
输出
注意:当我们应用 group by 时,它还包括下面突出显示的行,但我不想交换这些值,因为它有一个标准序列,红色在前,绿色在后。
我已经用下面的函数试过了,但是在输入了几百个之后,很难手动跟踪所有的组合。文件很大,有很多行和列。
def swap(x):
if x[0] < 0:
return [x[1],x[0]]
else:
return [x[0],x[1]]
有什么方法可以在给定条件下交换多个值吗?
编辑 1:在 Rob Raymond 的回答之后
import pandas as pd
import itertools
import random
df = pd.read_excel("Path\test_copy.xlsx") # My original excel sheet which contains all data
colors1 = []
colors2 = []
colors = []
colors1 = df['C'].values.tolist()
colors2 = df['E'].values.tolist()
colors = colors1 + colors2
colors = list( dict.fromkeys(colors) )
colorp = list(itertools.permutations(colors, 2))
df1 = pd.DataFrame([pd.Series(colorp[random.randint(0,len(colorp)-1)]).rename({0:"C",1:"E"}).to_dict() for i in range(20)])
# find rows where colors in different order to a previous combination
df2 = df.assign(swap=df.apply(lambda r: ((df.loc[(df.C.eq(r.E)&df.E.eq(r.C))].index.values)<r.name).any(), axis=1))
# swap the columns, can be extended to other columns
df2.loc[df2.swap] = df2.loc[df2.swap].rename(columns={"C":"E","E":"C"})
df2.loc[df2.swap] = df2.loc[df2.swap].rename(columns={"M":"N","M":"N"})
df2.loc[df2.swap] = df2.loc[df2.swap].rename(columns={"G":"I","I":"G"})
# lets compare what's happened...
df2.join(df, rsuffix="_start")
df2.to_excel (r"PAth\result_swapped.xlsx", index = None, header=True)
所有六列中的值按预期同时交换,但结果不准确。输出文件仍然包含 opposite sequence
中“C”和“E”列中的一些值。对于那些 wrong sequence
行,交换状态是 “TRUE”
。这意味着原始序列是正确的,但我们的脚本已经交换了它。
- 正在模拟您的数据
- 模拟条件 - 较早的行与列的顺序相反
- 交换列是使用 mask 和
rename()
完成的
import itertools
colors = ["Red","Green","Blue","Purple","Indigo","Pink"]
colorp = list(itertools.permutations(colors, 2))
df = pd.DataFrame([pd.Series(colorp[random.randint(0,len(colorp)-1)]).rename({0:"C",1:"E"}).to_dict() for i in range(20)])
# find rows where colors in different order to a previous combination
df2 = df.assign(swap=df.apply(lambda r: ((df.loc[(df.C.eq(r.E)&df.E.eq(r.C))].index.values)<r.name).any(), axis=1))
# swap the columns, can be extended to other columns
df2.loc[df2.swap] = df2.loc[df2.swap].rename(columns={"C":"E","E":"C"})
# lets compare what's happened...
df2.join(df, rsuffix="_start")
C
E
swap
C_start
E_start
0
Green
Indigo
False
Green
Indigo
1
Pink
Red
False
Pink
Red
2
Indigo
Blue
False
Indigo
Blue
3
Green
Blue
False
Green
Blue
4
Green
Indigo
True
Indigo
Green
5
Indigo
Blue
True
Blue
Indigo
6
Pink
Purple
False
Pink
Purple
7
Indigo
Blue
True
Blue
Indigo
8
Green
Pink
False
Green
Pink
9
Red
Blue
False
Red
Blue
10
Red
Indigo
False
Red
Indigo
11
Red
Purple
False
Red
Purple
12
Green
Indigo
True
Indigo
Green
13
Pink
Purple
True
Purple
Pink
14
Green
Indigo
True
Indigo
Green
15
Purple
Indigo
False
Purple
Indigo
16
Indigo
Blue
True
Blue
Indigo
17
Green
Blue
False
Green
Blue
18
Red
Green
False
Red
Green
19
Indigo
Green
True
Green
Indigo
我有一个包含很多列的大型 Pandas DataFrame。我想确保 C 列和 E 列包含相同顺序的值。
例如:如果 first two rows shows (red and green) & third row shows (Green and red)
那么 third row should change it to red and green
如下所示。
输入
输出
附加任务:
进行此更改时,我想交换同一行中其他四列(2 对)的值。
输入
输出
注意:当我们应用 group by 时,它还包括下面突出显示的行,但我不想交换这些值,因为它有一个标准序列,红色在前,绿色在后。
我已经用下面的函数试过了,但是在输入了几百个之后,很难手动跟踪所有的组合。文件很大,有很多行和列。
def swap(x):
if x[0] < 0:
return [x[1],x[0]]
else:
return [x[0],x[1]]
有什么方法可以在给定条件下交换多个值吗?
编辑 1:在 Rob Raymond 的回答之后
import pandas as pd
import itertools
import random
df = pd.read_excel("Path\test_copy.xlsx") # My original excel sheet which contains all data
colors1 = []
colors2 = []
colors = []
colors1 = df['C'].values.tolist()
colors2 = df['E'].values.tolist()
colors = colors1 + colors2
colors = list( dict.fromkeys(colors) )
colorp = list(itertools.permutations(colors, 2))
df1 = pd.DataFrame([pd.Series(colorp[random.randint(0,len(colorp)-1)]).rename({0:"C",1:"E"}).to_dict() for i in range(20)])
# find rows where colors in different order to a previous combination
df2 = df.assign(swap=df.apply(lambda r: ((df.loc[(df.C.eq(r.E)&df.E.eq(r.C))].index.values)<r.name).any(), axis=1))
# swap the columns, can be extended to other columns
df2.loc[df2.swap] = df2.loc[df2.swap].rename(columns={"C":"E","E":"C"})
df2.loc[df2.swap] = df2.loc[df2.swap].rename(columns={"M":"N","M":"N"})
df2.loc[df2.swap] = df2.loc[df2.swap].rename(columns={"G":"I","I":"G"})
# lets compare what's happened...
df2.join(df, rsuffix="_start")
df2.to_excel (r"PAth\result_swapped.xlsx", index = None, header=True)
所有六列中的值按预期同时交换,但结果不准确。输出文件仍然包含 opposite sequence
中“C”和“E”列中的一些值。对于那些 wrong sequence
行,交换状态是 “TRUE”
。这意味着原始序列是正确的,但我们的脚本已经交换了它。
- 正在模拟您的数据
- 模拟条件 - 较早的行与列的顺序相反
- 交换列是使用 mask 和
rename()
完成的
import itertools
colors = ["Red","Green","Blue","Purple","Indigo","Pink"]
colorp = list(itertools.permutations(colors, 2))
df = pd.DataFrame([pd.Series(colorp[random.randint(0,len(colorp)-1)]).rename({0:"C",1:"E"}).to_dict() for i in range(20)])
# find rows where colors in different order to a previous combination
df2 = df.assign(swap=df.apply(lambda r: ((df.loc[(df.C.eq(r.E)&df.E.eq(r.C))].index.values)<r.name).any(), axis=1))
# swap the columns, can be extended to other columns
df2.loc[df2.swap] = df2.loc[df2.swap].rename(columns={"C":"E","E":"C"})
# lets compare what's happened...
df2.join(df, rsuffix="_start")
C | E | swap | C_start | E_start | |
---|---|---|---|---|---|
0 | Green | Indigo | False | Green | Indigo |
1 | Pink | Red | False | Pink | Red |
2 | Indigo | Blue | False | Indigo | Blue |
3 | Green | Blue | False | Green | Blue |
4 | Green | Indigo | True | Indigo | Green |
5 | Indigo | Blue | True | Blue | Indigo |
6 | Pink | Purple | False | Pink | Purple |
7 | Indigo | Blue | True | Blue | Indigo |
8 | Green | Pink | False | Green | Pink |
9 | Red | Blue | False | Red | Blue |
10 | Red | Indigo | False | Red | Indigo |
11 | Red | Purple | False | Red | Purple |
12 | Green | Indigo | True | Indigo | Green |
13 | Pink | Purple | True | Purple | Pink |
14 | Green | Indigo | True | Indigo | Green |
15 | Purple | Indigo | False | Purple | Indigo |
16 | Indigo | Blue | True | Blue | Indigo |
17 | Green | Blue | False | Green | Blue |
18 | Red | Green | False | Red | Green |
19 | Indigo | Green | True | Green | Indigo |