重复数字时的数据框
Dataframing when the numbers are repeated
a) ID和SNR重复时,如何找出最大的五个SNR?而且我还想要所有这三列作为输出。
b) 我也想输出删除的行。
FIT ID SNR
1011563.fit, J16142485-3141000 , 36
1011729.fit, J17210134-3757437 , 18
1011730.fit, J17210134-3757437 , 20
1011731.fit, J17210134-3757437 , 20
1011732.fit, J17210134-3757437 , 13
1011914.fit, J17210134-3757437 , 38
1011915.fit, J17210134-3757437 , 26
1011916.fit, J17210134-3757437 , 19
1011917.fit, J17210134-3757437 , 47
1011918.fit, J17210134-3757437 , 25 ´´´
The result should look somewhat like this.
Expected output for a.
FITS ID SNR
```8 1011917.fit J17210134-3757437 47
5 1011914.fit J17210134-3757437 38
0 1011563.fit J16142485-3141000 36
6 1011915.fit J17210134-3757437 26
9 1011918.fit J17210134-3757437 25
3 1011731.fit J17210134-3757437 20
2 1011730.fit J17210134-3757437 20 ´´´
Expected output for b)
``` FITS ID SNR
1 1011729.fit J17210134-3757437 18
6 1011915.fit J17210134-3757437 26
7 1011916.fit J17210134-3757437 19
4 1011732.fit J17210134-3757437 13´´´
你可以得到每组最大值的最小值,然后切片:
thresh = df.groupby('ID')['SNR'].nlargest(5).groupby(level=0).min()
m = df['ID'].map(thresh).le(df['SNR'])
a = df[m]
b = df[~m]
输出:
# tresh
ID
J16142485-3141000 36
J17210134-3757437 20
Name: SNR, dtype: int64
# a
FIT ID SNR
0 1011563.fit J16142485-3141000 36
2 1011730.fit J17210134-3757437 20
3 1011731.fit J17210134-3757437 20
5 1011914.fit J17210134-3757437 38
6 1011915.fit J17210134-3757437 26
8 1011917.fit J17210134-3757437 47
9 1011918.fit J17210134-3757437 25
# b
FIT ID SNR
1 1011729.fit J17210134-3757437 18
4 1011732.fit J17210134-3757437 13
7 1011916.fit J17210134-3757437 19
您可以使用 groupby_rank
:
rank = df.groupby('ID')['SNR'].rank(method='dense', ascending=False)
a = df[rank <= 5]
b = df[rank > 5]
输出:
>>> a
FIT ID SNR
0 1011563.fit J16142485-3141000 36
2 1011730.fit J17210134-3757437 20
3 1011731.fit J17210134-3757437 20
5 1011914.fit J17210134-3757437 38
6 1011915.fit J17210134-3757437 26
8 1011917.fit J17210134-3757437 47
9 1011918.fit J17210134-3757437 25
>>> b
FIT ID SNR
1 1011729.fit J17210134-3757437 18
4 1011732.fit J17210134-3757437 13
7 1011916.fit J17210134-3757437 19
a) ID和SNR重复时,如何找出最大的五个SNR?而且我还想要所有这三列作为输出。 b) 我也想输出删除的行。
FIT ID SNR
1011563.fit, J16142485-3141000 , 36
1011729.fit, J17210134-3757437 , 18
1011730.fit, J17210134-3757437 , 20
1011731.fit, J17210134-3757437 , 20
1011732.fit, J17210134-3757437 , 13
1011914.fit, J17210134-3757437 , 38
1011915.fit, J17210134-3757437 , 26
1011916.fit, J17210134-3757437 , 19
1011917.fit, J17210134-3757437 , 47
1011918.fit, J17210134-3757437 , 25 ´´´
The result should look somewhat like this.
Expected output for a.
FITS ID SNR
```8 1011917.fit J17210134-3757437 47
5 1011914.fit J17210134-3757437 38
0 1011563.fit J16142485-3141000 36
6 1011915.fit J17210134-3757437 26
9 1011918.fit J17210134-3757437 25
3 1011731.fit J17210134-3757437 20
2 1011730.fit J17210134-3757437 20 ´´´
Expected output for b)
``` FITS ID SNR
1 1011729.fit J17210134-3757437 18
6 1011915.fit J17210134-3757437 26
7 1011916.fit J17210134-3757437 19
4 1011732.fit J17210134-3757437 13´´´
你可以得到每组最大值的最小值,然后切片:
thresh = df.groupby('ID')['SNR'].nlargest(5).groupby(level=0).min()
m = df['ID'].map(thresh).le(df['SNR'])
a = df[m]
b = df[~m]
输出:
# tresh
ID
J16142485-3141000 36
J17210134-3757437 20
Name: SNR, dtype: int64
# a
FIT ID SNR
0 1011563.fit J16142485-3141000 36
2 1011730.fit J17210134-3757437 20
3 1011731.fit J17210134-3757437 20
5 1011914.fit J17210134-3757437 38
6 1011915.fit J17210134-3757437 26
8 1011917.fit J17210134-3757437 47
9 1011918.fit J17210134-3757437 25
# b
FIT ID SNR
1 1011729.fit J17210134-3757437 18
4 1011732.fit J17210134-3757437 13
7 1011916.fit J17210134-3757437 19
您可以使用 groupby_rank
:
rank = df.groupby('ID')['SNR'].rank(method='dense', ascending=False)
a = df[rank <= 5]
b = df[rank > 5]
输出:
>>> a
FIT ID SNR
0 1011563.fit J16142485-3141000 36
2 1011730.fit J17210134-3757437 20
3 1011731.fit J17210134-3757437 20
5 1011914.fit J17210134-3757437 38
6 1011915.fit J17210134-3757437 26
8 1011917.fit J17210134-3757437 47
9 1011918.fit J17210134-3757437 25
>>> b
FIT ID SNR
1 1011729.fit J17210134-3757437 18
4 1011732.fit J17210134-3757437 13
7 1011916.fit J17210134-3757437 19