Python numpy 百分位数与 scipy 百分位数

Python numpy percentile vs scipy percentileofscore

我对自己做错了什么感到困惑。

我有以下代码:

import numpy as np
from scipy import stats

df
Out[29]: array([66., 69., 67., 75., 69., 69.])

val = 73.94
z1 = stats.percentileofscore(df, val)
print(z1)
Out[33]: 83.33333333333334

np.percentile(df, z1)
Out[34]: 69.999999999

我期待 np.percentile(df, z1) 会回馈我 val = 73.94

我认为您不太了解 percentileofscorepercentile 的实际作用。它们是而不是彼此的反面。


来自 scipy.stats.percentileofscore 的文档:

The percentile rank of a score relative to a list of scores.

A percentileofscore of, for example, 80% means that 80% of the scores in a are below the given score. In the case of gaps or ties, the exact definition depends on the optional keyword, kind.

因此,当您提供值 73.94 时,df 中有 5 个元素低于该分数,并且 5/6 为您提供 83.3333% 结果。


现在在 numpy.percentile 的注释中:

Given a vector V of length N, the q-th percentile of V is the value q/100 of the way from the minimum to the maximum in a sorted copy of V.

默认的interpolation参数是'linear'所以:

'linear': i + (j - i) * fraction, where fraction is the fractional part of the index surrounded by i and j.

由于您提供了 83 作为输入参数,因此您正在查看数组中从最小值到最大值的值 83/100

如果您有兴趣深入挖掘源代码,可以找到它 here,但这里是对此处进行的计算的简化查看:

ap = np.asarray(sorted(df))
Nx = df.shape[0]

indices = z1 / 100 * (Nx - 1)
indices_below = np.floor(indices).astype(int)
indices_above = indices_below + 1

weight_above = indices - indices_below
weight_below = 1 - weight_above

x1 = ap[b] * weight_below   # 57.50000000000004
x2 = ap[a] * weight_above   # 12.499999999999956

x1 + x2

70.0