在多列上对 pandas 个数据框行进行排名

Ranking pandas data frame rows on multiple columns

我是 Pandas 的新手。我试图了解如何在 pandas 中做某事,我在 SQL -

中做

我有一个 table 喜欢 -

Account Company Blotter
112233  10      62
233445  12      62
233445  10      66
343454  21      66
343454  21      64
768876  25      54

在 SQL 中,如果给定的帐户出现在多行中,我会使用 rank() 并且如果我想优先考虑某家公司,我会放置一个案例声明来强制该公司被优先考虑。我还可以使用记事簿列作为附加排名参数。 例如

rank() over(
    partition by ACCOUNT 
    order by case 
                when COMPANY='12' then 0 
                when COMPANY='21' then 1 
                else COMPANY 
             end, 
             case 
                when BLOTTER ='66' then 0 
                else BLOTTER 
             end
)

预期输出:

   Account  Company  Blotter  rank
0   112233       10       62     1
1   233445       12       62     1
2   233445       10       66     2
3   343454       21       66     1
4   343454       21       64     2
5   768876       25       54     1

pandas sort_values DataFrame 的方法可能就是您要找的。

import pandas as pd

data = [
[112233, 10, 62],
[233445, 12, 62],
[233445, 10, 66],
[343454, 21, 66],
[343454, 21, 64],
[768876, 25, 54]]

df = pd.DataFrame(data, columns=['Account', 'Company', 'Blotter'])
df
   Account  Company Blotter
0   112233  10  62
1   233445  12  62
2   233445  10  66
3   343454  21  66
4   343454  21  64
5   768876  25  54
df_shuffled = df.sample(frac=1, random_state=0)   # shuffle the rows
df_shuffled
    Account Company Blotter
5   768876  25  54
2   233445  10  66
1   233445  12  62
3   343454  21  66
0   112233  10  62
4   343454  21  64
df_shuffled.sort_values(by=['Account', 'Company', 'Blotter'], 
                        ascending=[True, False, False])
    Account Company Blotter
0   112233  10  62
1   233445  12  62
2   233445  10  66
3   343454  21  66
4   343454  21  64
5   768876  25  54

您可能想试试这个:

# recompute the sort criteria for company and blotter
ser_sort_company= df['Company'].map({12: 0, 21: 1}).fillna(df['Company'])
ser_sort_blotter= df['Blotter'].map({12: 0, 21: 1}).fillna(df['Blotter'])
df['rank']= (df
     # temporarily create sort columns
     .assign(sort_company=ser_sort_company)
     .assign(sort_blotter=ser_sort_blotter)
     # temporarily sort the result
     # this replaces the ORDER BY part
     .sort_values(['sort_company', 'sort_blotter'])
     # group by Account to replace the PARTITION BY part
     .groupby('Account')
     # get the position of the record in the group (RANK part)
     .transform('cumcount') + 1
)

df

计算结果为:

   Account  Company  Blotter  rank
0   112233       10       62     1
1   233445       12       62     1
2   233445       10       66     2
3   343454       21       66     2
4   343454       21       64     1
5   768876       25       54     1