Pandas drop_duplicates 有多个条件
Pandas drop_duplicates with multiple conditions
我有一些测量数据需要过滤,我将它们读取为数据帧数据,如下所示:
df
RequestTime RequestID ResponseTime ResponseID
0 150 14 103 101
1 150 15 110 102
2 25 16 121 103
3 25 16 97 104
4 22 16 44 105
5 19 17 44 106
6 26 18 29 106
7 30 18 29 106
而且我需要同时使用两个不同的条件,即使用[=26=过滤'RequestTime''RequestID'和'ResponseTime''ResponseID' ](子集=) 同时。我已经使用 follow 命令来获取两个条件中每一个的过滤结果:
>>>df[['RequestTime','RequestID','ResponseTime','ResponseID']].drop_duplicates(subset = ['ResponseTime','ResponseID'])
RequestTime RequestID ResponseTime ResponseID
0 150 14 103 101
1 150 15 110 102
2 25 16 121 103
4 22 16 44 105
5 19 17 44 106
6 26 18 29 106
7 30 18 29 106
>>>df[['RequestTime','RequestID','ResponseTime','ResponseID']].drop_duplicates(subset = ['RequestTime','RequestID'])
RequestTime RequestID ResponseTime ResponseID
0 150 14 103 101
1 150 15 110 102
2 25 16 121 103
3 25 16 97 104
4 22 16 44 105
5 19 17 44 106
6 26 18 29 106
但是如何结合这两个条件来删除重复的第 3 行和第 7 行?
IIUC,
m = ~(df.duplicated(subset=['RequestTime','RequestID']) | df.duplicated(subset=['ResponseTime', 'ResponseID']))
df[m]
输出:
RequestTime RequestID ResponseTime ResponseID
0 150 14 103 101
1 150 15 110 102
2 25 16 121 103
4 22 16 44 105
5 19 17 44 106
6 26 18 29 106
创建掩码(布尔系列)以对数据帧进行布尔索引。
或连锁方法:
df.drop_duplicates(subset=['RequestTime', 'RequestID']).drop_duplicates(subset=['ResponseTime', 'ResponseID'])
我有一些测量数据需要过滤,我将它们读取为数据帧数据,如下所示:
df
RequestTime RequestID ResponseTime ResponseID
0 150 14 103 101
1 150 15 110 102
2 25 16 121 103
3 25 16 97 104
4 22 16 44 105
5 19 17 44 106
6 26 18 29 106
7 30 18 29 106
而且我需要同时使用两个不同的条件,即使用[=26=过滤'RequestTime''RequestID'和'ResponseTime''ResponseID' ](子集=) 同时。我已经使用 follow 命令来获取两个条件中每一个的过滤结果:
>>>df[['RequestTime','RequestID','ResponseTime','ResponseID']].drop_duplicates(subset = ['ResponseTime','ResponseID'])
RequestTime RequestID ResponseTime ResponseID
0 150 14 103 101
1 150 15 110 102
2 25 16 121 103
4 22 16 44 105
5 19 17 44 106
6 26 18 29 106
7 30 18 29 106
>>>df[['RequestTime','RequestID','ResponseTime','ResponseID']].drop_duplicates(subset = ['RequestTime','RequestID'])
RequestTime RequestID ResponseTime ResponseID
0 150 14 103 101
1 150 15 110 102
2 25 16 121 103
3 25 16 97 104
4 22 16 44 105
5 19 17 44 106
6 26 18 29 106
但是如何结合这两个条件来删除重复的第 3 行和第 7 行?
IIUC,
m = ~(df.duplicated(subset=['RequestTime','RequestID']) | df.duplicated(subset=['ResponseTime', 'ResponseID']))
df[m]
输出:
RequestTime RequestID ResponseTime ResponseID
0 150 14 103 101
1 150 15 110 102
2 25 16 121 103
4 22 16 44 105
5 19 17 44 106
6 26 18 29 106
创建掩码(布尔系列)以对数据帧进行布尔索引。
或连锁方法:
df.drop_duplicates(subset=['RequestTime', 'RequestID']).drop_duplicates(subset=['ResponseTime', 'ResponseID'])