Python numpy 百分位数与 scipy 百分位数

Question

我对自己做错了什么感到困惑。

我有以下代码：

import numpy as np
from scipy import stats

df
Out[29]: array([66., 69., 67., 75., 69., 69.])

val = 73.94
z1 = stats.percentileofscore(df, val)
print(z1)
Out[33]: 83.33333333333334

np.percentile(df, z1)
Out[34]: 69.999999999

我期待 np.percentile(df, z1) 会回馈我 val = 73.94

Answer 1

我认为您不太了解 percentileofscore 和 percentile 的实际作用。它们是而不是彼此的反面。

来自 scipy.stats.percentileofscore 的文档：

The percentile rank of a score relative to a list of scores.

A percentileofscore of, for example, 80% means that 80% of the scores in a are below the given score. In the case of gaps or ties, the exact definition depends on the optional keyword, kind.

因此，当您提供值 73.94 时，df 中有 5 个元素低于该分数，并且 5/6 为您提供 83.3333% 结果。

现在在 numpy.percentile 的注释中：

Given a vector V of length N, the q-th percentile of V is the value q/100 of the way from the minimum to the maximum in a sorted copy of V.

默认的interpolation参数是'linear'所以：

'linear': i + (j - i) * fraction, where fraction is the fractional part of the index surrounded by i and j.

由于您提供了 83 作为输入参数，因此您正在查看数组中从最小值到最大值的值 83/100。

如果您有兴趣深入挖掘源代码，可以找到它 here，但这里是对此处进行的计算的简化查看：

ap = np.asarray(sorted(df))
Nx = df.shape[0]

indices = z1 / 100 * (Nx - 1)
indices_below = np.floor(indices).astype(int)
indices_above = indices_below + 1

weight_above = indices - indices_below
weight_below = 1 - weight_above

x1 = ap[b] * weight_below   # 57.50000000000004
x2 = ap[a] * weight_above   # 12.499999999999956

x1 + x2

70.0

Python numpy 百分位数与 scipy 百分位数

Python numpy percentile vs scipy percentileofscore

numpy

scipy

python-3.5