在多列上对 pandas 个数据框行进行排名
Ranking pandas data frame rows on multiple columns
我是 Pandas 的新手。我试图了解如何在 pandas 中做某事,我在 SQL -
中做
我有一个 table 喜欢 -
Account Company Blotter
112233 10 62
233445 12 62
233445 10 66
343454 21 66
343454 21 64
768876 25 54
在 SQL 中,如果给定的帐户出现在多行中,我会使用 rank() 并且如果我想优先考虑某家公司,我会放置一个案例声明来强制该公司被优先考虑。我还可以使用记事簿列作为附加排名参数。
例如
rank() over(
partition by ACCOUNT
order by case
when COMPANY='12' then 0
when COMPANY='21' then 1
else COMPANY
end,
case
when BLOTTER ='66' then 0
else BLOTTER
end
)
预期输出:
Account Company Blotter rank
0 112233 10 62 1
1 233445 12 62 1
2 233445 10 66 2
3 343454 21 66 1
4 343454 21 64 2
5 768876 25 54 1
pandas sort_values DataFrame 的方法可能就是您要找的。
import pandas as pd
data = [
[112233, 10, 62],
[233445, 12, 62],
[233445, 10, 66],
[343454, 21, 66],
[343454, 21, 64],
[768876, 25, 54]]
df = pd.DataFrame(data, columns=['Account', 'Company', 'Blotter'])
df
Account Company Blotter
0 112233 10 62
1 233445 12 62
2 233445 10 66
3 343454 21 66
4 343454 21 64
5 768876 25 54
df_shuffled = df.sample(frac=1, random_state=0) # shuffle the rows
df_shuffled
Account Company Blotter
5 768876 25 54
2 233445 10 66
1 233445 12 62
3 343454 21 66
0 112233 10 62
4 343454 21 64
df_shuffled.sort_values(by=['Account', 'Company', 'Blotter'],
ascending=[True, False, False])
Account Company Blotter
0 112233 10 62
1 233445 12 62
2 233445 10 66
3 343454 21 66
4 343454 21 64
5 768876 25 54
您可能想试试这个:
# recompute the sort criteria for company and blotter
ser_sort_company= df['Company'].map({12: 0, 21: 1}).fillna(df['Company'])
ser_sort_blotter= df['Blotter'].map({12: 0, 21: 1}).fillna(df['Blotter'])
df['rank']= (df
# temporarily create sort columns
.assign(sort_company=ser_sort_company)
.assign(sort_blotter=ser_sort_blotter)
# temporarily sort the result
# this replaces the ORDER BY part
.sort_values(['sort_company', 'sort_blotter'])
# group by Account to replace the PARTITION BY part
.groupby('Account')
# get the position of the record in the group (RANK part)
.transform('cumcount') + 1
)
df
计算结果为:
Account Company Blotter rank
0 112233 10 62 1
1 233445 12 62 1
2 233445 10 66 2
3 343454 21 66 2
4 343454 21 64 1
5 768876 25 54 1
我是 Pandas 的新手。我试图了解如何在 pandas 中做某事,我在 SQL -
中做我有一个 table 喜欢 -
Account Company Blotter
112233 10 62
233445 12 62
233445 10 66
343454 21 66
343454 21 64
768876 25 54
在 SQL 中,如果给定的帐户出现在多行中,我会使用 rank() 并且如果我想优先考虑某家公司,我会放置一个案例声明来强制该公司被优先考虑。我还可以使用记事簿列作为附加排名参数。 例如
rank() over(
partition by ACCOUNT
order by case
when COMPANY='12' then 0
when COMPANY='21' then 1
else COMPANY
end,
case
when BLOTTER ='66' then 0
else BLOTTER
end
)
预期输出:
Account Company Blotter rank
0 112233 10 62 1
1 233445 12 62 1
2 233445 10 66 2
3 343454 21 66 1
4 343454 21 64 2
5 768876 25 54 1
pandas sort_values DataFrame 的方法可能就是您要找的。
import pandas as pd
data = [
[112233, 10, 62],
[233445, 12, 62],
[233445, 10, 66],
[343454, 21, 66],
[343454, 21, 64],
[768876, 25, 54]]
df = pd.DataFrame(data, columns=['Account', 'Company', 'Blotter'])
df
Account Company Blotter
0 112233 10 62
1 233445 12 62
2 233445 10 66
3 343454 21 66
4 343454 21 64
5 768876 25 54
df_shuffled = df.sample(frac=1, random_state=0) # shuffle the rows
df_shuffled
Account Company Blotter
5 768876 25 54
2 233445 10 66
1 233445 12 62
3 343454 21 66
0 112233 10 62
4 343454 21 64
df_shuffled.sort_values(by=['Account', 'Company', 'Blotter'],
ascending=[True, False, False])
Account Company Blotter
0 112233 10 62
1 233445 12 62
2 233445 10 66
3 343454 21 66
4 343454 21 64
5 768876 25 54
您可能想试试这个:
# recompute the sort criteria for company and blotter
ser_sort_company= df['Company'].map({12: 0, 21: 1}).fillna(df['Company'])
ser_sort_blotter= df['Blotter'].map({12: 0, 21: 1}).fillna(df['Blotter'])
df['rank']= (df
# temporarily create sort columns
.assign(sort_company=ser_sort_company)
.assign(sort_blotter=ser_sort_blotter)
# temporarily sort the result
# this replaces the ORDER BY part
.sort_values(['sort_company', 'sort_blotter'])
# group by Account to replace the PARTITION BY part
.groupby('Account')
# get the position of the record in the group (RANK part)
.transform('cumcount') + 1
)
df
计算结果为:
Account Company Blotter rank
0 112233 10 62 1
1 233445 12 62 1
2 233445 10 66 2
3 343454 21 66 2
4 343454 21 64 1
5 768876 25 54 1