pandas 中的排名是如何计算的

Question

我混淆理解rank of series.I 知道rank是在一个series.If两个数相等时从最高值到最低值计算的，然后pandas计算平均值的数字。

In this example,the highest value is 7.why do we get rank 5.5 for number 7 and rank 1.5 for number 4

S1 = pd.Series([7,6,7,5,4,4])
S1.rank()

Output:

0    5.5
1    4.0
2    5.5
3    3.0
4    1.5
5    1.5
dtype: float64

Answer 1

排名是这样计算的

按升序排列元素，最低元素从“1”开始分配排名。

Elements - 4, 4, 5, 6, 7, 7
Ranks    - 1, 2, 3, 4, 5, 6

现在考虑重复项，平均出相应的排名并将平均排名分配给它们。

由于“4”重复了两次，因此每次出现的最终排名将是 1,2 的平均值，即 1.5。以同样的方式或 7，每次出现的最终排名将是 5,6 的平均值，即 5.5

Elements -   4,   4,   5, 6, 7,   7
Ranks    -   1,   2,   3, 4, 5,   6
Final Rank - 1.5, 1.5, 3, 4, 5.5, 5.5

Answer 2

正如 Joachim 评论的那样，rank 函数接受参数 method，默认值为 'average'。即最终排名是所有相同值排名的平均值。

根据文档，method 的其他选项是：

method : {'average', 'min', 'max', 'first', 'dense'}, default 'average' How to rank the group of records that have the same value (i.e. ties):

average: average rank of the group

min: lowest rank in the group

max: highest rank in the group

first: ranks assigned in order they appear in the array

dense: like 'min', but rank always increases by 1 between groups numeric_only : bool, optional

例如，让我们尝试：method='dense'，然后 S1.rank(method='dense') 给出：

0    4.0
1    3.0
2    4.0
3    2.0
4    1.0
5    1.0
dtype: float64

有点等同于factorize。

更新：根据你的问题，让我们尝试编写一个行为类似于 S1.rank():

的函数

def my_rank(s):
    # sort s by values
    s_sorted = s.sort_values(kind='mergesort')

    # this is the incremental ranks
    # equivalent to s.rank(method='first')
    ranks = pd.Series(np.arange(len(s_sorted))+1, index=s_sorted.index)

    # averaged ranks
    avg_ranks = ranks.groupby(s_sorted).transform('mean')

    return avg_ranks

Answer 3

如果你想要最高排名，你正在执行默认排名，如下所示

S1 = pd.Series([7,6,7,5,4,4])
S1.rank(method='max')

这里是pandas

支持的所有rank

方法：{‘average’、‘min’、‘max’、‘first’、‘dense’}，默认为‘average’

S1['default_rank'] = S1.rank()
S1['max_rank'] = S1.rank(method='max')
S1['NA_bottom'] = S1.rank(na_option='bottom')
S1['pct_rank'] = S1.rank(pct=True)
print(S1)

pandas 中的排名是如何计算的

how rank is calculated in pandas

python

rank

pandas