如何从重复项中保留特定的重复项?

How to keep a specific duplicate from among duplicates?

我有一个 .csv 文件。

time,open,high,low,close,Extremum,Fib 1,Fib 2,Fib 3,l100,LS3,SS3,Volume,Volume MA

很多行,例如:

2022-04-08T02:00:00+02:00,43.431,43.44,43.431,43.44,44.669,43.58332033414956,43.28818411430672,43.11250779297169,42.91223678664976,,,78.07,

并且有重复项,例如其中的 4 个,在“极值”列中有所不同 像这样:

2022-04-07 17:10:25,41.622,41.625,41.622,41.625,43.6,42.38191401399852,42.05078384304666,41.85368255081341,41.6289870776675,41.007714285714286,,6.99,571.0029999999954
2022-04-07 17:10:25,41.622,41.625,41.622,41.625,41.589,42.64812186602502,42.93603848979882,43.10741743252131,43.30278942722496,,,6.99,571.0029999999954
2022-04-07 17:10:25,41.622,41.625,41.622,41.625,43.6,42.38191401399852,42.05078384304666,41.85368255081341,41.6289870776675,41.007714285714286,,6.99,571.0029999999954
2022-04-07 17:10:25,41.622,41.625,41.622,41.625,43.6,42.38191401399852,42.05078384304666,41.85368255081341,41.6289870776675,41.007714285714286,,6.99,571.0029999999954

它按 'time' 排序,轴=0(它的 A 列,计算中的第 0 列 sheet)

csvData.sort_values(by=["time"],axis=0,ascending=True,inplace=True,na_position='first')

一次17:10:25被骗了4个,怎么扔掉不匹配的?

我们这里有:41.589、43.6、43.6、43.6。 41.589错误,需要out,只需要保留1份剩余的3个dupes(drop.duplicated可以做到,但不能给我4个dupes来处理,只能设置在3 种方式:keep='first'、keep='last' 或 keep=False,我不需要存在 keep=True.. 我需要所有 4 个骗子中的 return,来检查哪个是4 个中的 1 个不好,在我 unique_seen 全部之前,只减少到 1,在这种情况下正确 43.6。有人知道如何实现吗? 在堆栈中看到了一些想法,但对它们的理解不足以应用到我的案例中,所以我请求帮助。

您可以使用两种不同的模式 duplicated 两次:keep=False 和您选择的另一种模式。然后从这两个计算出一个布尔掩码用于切片。

假设这个示例数据集:

  date col  other
0    a   a      0
1    a   a      1
2    a   X      2   # unique
3    a   a      3
4    b   Y      4   # unique
5    b   b      5
6    b   b      6
7    b   b      7

您可以使用:

m1 = df.duplicated(subset=['date','col'])
m2 = df.duplicated(subset=['date','col'], keep=False)

df2 = df[m1!=m2]

输出:

  date col  other
0    a   a      0
5    b   b      5

中间体:

  date col  other     m1     m2  m1!=m2
0    a   a      0  False   True    True
1    a   a      1   True   True   False
2    a   X      2  False  False   False
3    a   a      3   True   True   False
4    b   Y      4  False  False   False
5    b   b      5  False   True    True
6    b   b      6   True   True   False
7    b   b      7   True   True   False