Pandas 根据前一行替换值
Pandas replacing values depending on prior row
我是 pandas 的新手,希望您就如何解决我的问题提出意见。我有以下数据框:
df = pd.DataFrame({'A' : ["me","you","you","me","me","me","me"],
'B' : ["Y","X","X","X","X","X","Z"],
'C' : ["1","2","3","4","5","6","7"]
})
我需要根据 A 列和 B 列中的行值对其进行转换。逻辑应该是,只要 A 列和 B 列中的值在连续的行上相同,该序列中的第一行就应该是保留但后续行应在 B 列中设置 'A'。
例如:A 列和 B 列中的值在第 1 行和第 2 行中相同。B 列第 2 行中的值应替换为 A。这是我的预期输出:
df2= pd.DataFrame({'A' : ["me","you","you","me","me","me","me"],
'B' : ["Y","X","A","X","A","A","Z"],
'C' : ["1","2","3","4","5","6","7"]})
您可以先对 A
和 B
列求和:
a = df.A + df.B
然后与移位版本比较:
print (a != a.shift())
0 True
1 True
2 False
3 True
4 False
5 False
6 True
dtype: bool
通过 cumsum
创建独特的组:
print ((a != a.shift()).cumsum())
0 1
1 2
2 2
3 3
4 3
5 3
6 4
dtype: int32
获取重复值的布尔掩码:
print ((a != a.shift()).cumsum().duplicated())
0 False
1 False
2 True
3 False
4 True
5 True
6 False
dtype: bool
将 True
值替换为 A
的解决方案:
df.loc[(a != a.shift()).cumsum().duplicated(), 'B'] = 'A'
print (df)
A B C
0 me Y 1
1 you X 2
2 you A 3
3 me X 4
4 me A 5
5 me A 6
6 me Z 7
df.B = df.B.mask((a != a.shift()).cumsum().duplicated(), 'A')
print (df)
A B C
0 me Y 1
1 you X 2
2 you A 3
3 me X 4
4 me A 5
5 me A 6
6 me Z 7
print (df2.equals(df))
True
我是 pandas 的新手,希望您就如何解决我的问题提出意见。我有以下数据框:
df = pd.DataFrame({'A' : ["me","you","you","me","me","me","me"],
'B' : ["Y","X","X","X","X","X","Z"],
'C' : ["1","2","3","4","5","6","7"]
})
我需要根据 A 列和 B 列中的行值对其进行转换。逻辑应该是,只要 A 列和 B 列中的值在连续的行上相同,该序列中的第一行就应该是保留但后续行应在 B 列中设置 'A'。
例如:A 列和 B 列中的值在第 1 行和第 2 行中相同。B 列第 2 行中的值应替换为 A。这是我的预期输出:
df2= pd.DataFrame({'A' : ["me","you","you","me","me","me","me"],
'B' : ["Y","X","A","X","A","A","Z"],
'C' : ["1","2","3","4","5","6","7"]})
您可以先对 A
和 B
列求和:
a = df.A + df.B
然后与移位版本比较:
print (a != a.shift())
0 True
1 True
2 False
3 True
4 False
5 False
6 True
dtype: bool
通过 cumsum
创建独特的组:
print ((a != a.shift()).cumsum())
0 1
1 2
2 2
3 3
4 3
5 3
6 4
dtype: int32
获取重复值的布尔掩码:
print ((a != a.shift()).cumsum().duplicated())
0 False
1 False
2 True
3 False
4 True
5 True
6 False
dtype: bool
将 True
值替换为 A
的解决方案:
df.loc[(a != a.shift()).cumsum().duplicated(), 'B'] = 'A'
print (df)
A B C
0 me Y 1
1 you X 2
2 you A 3
3 me X 4
4 me A 5
5 me A 6
6 me Z 7
df.B = df.B.mask((a != a.shift()).cumsum().duplicated(), 'A')
print (df)
A B C
0 me Y 1
1 you X 2
2 you A 3
3 me X 4
4 me A 5
5 me A 6
6 me Z 7
print (df2.equals(df))
True