三列时最大的五个数
Largest five numbers when there are three columns
a) ID重复时如何找出最大的5个SNR?而且我还想要所有这三列作为输出。
b) 我也想输出删除的行。
FIT ID SNR
1011563.fit, J16142485-3141000 , 36
1011729.fit, J17210134-3757437 , 18
1011730.fit, J17210134-3757437 , 20
1011731.fit, J17210134-3757437 , 20
1011732.fit, J17210134-3757437 , 13
1011914.fit, J17210134-3757437 , 38
1011915.fit, J17210134-3757437 , 26
1011916.fit, J17210134-3757437 , 19
1011917.fit, J17210134-3757437 , 47
1011918.fit, J17210134-3757437 , 25 ´´´
Expected output for a.
FITS ID SNR
```8 1011917.fit J17210134-3757437 47
5 1011914.fit J17210134-3757437 38
0 1011563.fit J16142485-3141000 36
6 1011915.fit J17210134-3757437 26
9 1011918.fit J17210134-3757437 25
2 1011730.fit J17210134-3757437 20 ´´´
Output b)
``` FITS ID SNR
1 1011729.fit J17210134-3757437 18
6 1011915.fit J17210134-3757437 26
7 1011916.fit J17210134-3757437 19
8 1011917.fit J17210134-3757437 47´´´
As you can see SNR "6 1011915.fit J17210134-3757437 26" and
"8 1011917.fit J17210134-3757437 47" are repeated. But I want this only as output a and not b.
在排序后的数据帧上使用groupby
+head
获取索引,然后切片:
idx = df.sort_values(by='SNR', ascending=False).groupby('ID').head(5).index
df2 = df.loc[idx]
输出:
FITS ID SNR
0 1004234.fits J16355032-2814188 714
4 1004238.fits J16355032-2814188 690
11 1004245.fits J16355032-2814188 645
8 1004242.fits J16355032-2814188 635
9 1004243.fits J16355032-2814188 522
17 1005114.fits J22154748+4954052 227
16 1005113.fits J22154748+4954052 212
13 1004476.fits J22152631+4958343 162
19 1005116.fits J22154748+4954052 160
18 1005115.fits J22154748+4954052 148
15 1004478.fits J22152631+4958343 103
14 1004477.fits J22152631+4958343 76
12 1004475.fits J22152631+4958343 62
其他行:
df3 = df.loc[df.index.difference(idx)]
输出:
FITS ID SNR
1 1004235.fits J16355032-2814188 444
2 1004236.fits J16355032-2814188 331
3 1004237.fits J16355032-2814188 492
5 1004239.fits J16355032-2814188 491
6 1004240.fits J16355032-2814188 489
7 1004241.fits J16355032-2814188 382
10 1004244.fits J16355032-2814188 385
a) ID重复时如何找出最大的5个SNR?而且我还想要所有这三列作为输出。 b) 我也想输出删除的行。
FIT ID SNR
1011563.fit, J16142485-3141000 , 36
1011729.fit, J17210134-3757437 , 18
1011730.fit, J17210134-3757437 , 20
1011731.fit, J17210134-3757437 , 20
1011732.fit, J17210134-3757437 , 13
1011914.fit, J17210134-3757437 , 38
1011915.fit, J17210134-3757437 , 26
1011916.fit, J17210134-3757437 , 19
1011917.fit, J17210134-3757437 , 47
1011918.fit, J17210134-3757437 , 25 ´´´
Expected output for a.
FITS ID SNR
```8 1011917.fit J17210134-3757437 47
5 1011914.fit J17210134-3757437 38
0 1011563.fit J16142485-3141000 36
6 1011915.fit J17210134-3757437 26
9 1011918.fit J17210134-3757437 25
2 1011730.fit J17210134-3757437 20 ´´´
Output b)
``` FITS ID SNR
1 1011729.fit J17210134-3757437 18
6 1011915.fit J17210134-3757437 26
7 1011916.fit J17210134-3757437 19
8 1011917.fit J17210134-3757437 47´´´
As you can see SNR "6 1011915.fit J17210134-3757437 26" and
"8 1011917.fit J17210134-3757437 47" are repeated. But I want this only as output a and not b.
在排序后的数据帧上使用groupby
+head
获取索引,然后切片:
idx = df.sort_values(by='SNR', ascending=False).groupby('ID').head(5).index
df2 = df.loc[idx]
输出:
FITS ID SNR
0 1004234.fits J16355032-2814188 714
4 1004238.fits J16355032-2814188 690
11 1004245.fits J16355032-2814188 645
8 1004242.fits J16355032-2814188 635
9 1004243.fits J16355032-2814188 522
17 1005114.fits J22154748+4954052 227
16 1005113.fits J22154748+4954052 212
13 1004476.fits J22152631+4958343 162
19 1005116.fits J22154748+4954052 160
18 1005115.fits J22154748+4954052 148
15 1004478.fits J22152631+4958343 103
14 1004477.fits J22152631+4958343 76
12 1004475.fits J22152631+4958343 62
其他行:
df3 = df.loc[df.index.difference(idx)]
输出:
FITS ID SNR
1 1004235.fits J16355032-2814188 444
2 1004236.fits J16355032-2814188 331
3 1004237.fits J16355032-2814188 492
5 1004239.fits J16355032-2814188 491
6 1004240.fits J16355032-2814188 489
7 1004241.fits J16355032-2814188 382
10 1004244.fits J16355032-2814188 385