Python numpy 百分位数与 scipy 百分位数
Python numpy percentile vs scipy percentileofscore
我对自己做错了什么感到困惑。
我有以下代码:
import numpy as np
from scipy import stats
df
Out[29]: array([66., 69., 67., 75., 69., 69.])
val = 73.94
z1 = stats.percentileofscore(df, val)
print(z1)
Out[33]: 83.33333333333334
np.percentile(df, z1)
Out[34]: 69.999999999
我期待 np.percentile(df, z1)
会回馈我 val = 73.94
我认为您不太了解 percentileofscore
和 percentile
的实际作用。它们是而不是彼此的反面。
来自 scipy.stats.percentileofscore
的文档:
The percentile rank of a score relative to a list of scores.
A percentileofscore
of, for example, 80% means that 80% of the scores in a are below the given score. In the case of gaps or ties, the exact definition depends on the optional keyword, kind.
因此,当您提供值 73.94
时,df
中有 5
个元素低于该分数,并且 5/6
为您提供 83.3333%
结果。
现在在 numpy.percentile
的注释中:
Given a vector V of length N, the q-th percentile of V is the value q/100 of the way from the minimum to the maximum in a sorted copy of V.
默认的interpolation
参数是'linear'
所以:
'linear': i + (j - i) * fraction
, where fraction is the fractional part of the index surrounded by i and j.
由于您提供了 83
作为输入参数,因此您正在查看数组中从最小值到最大值的值 83/100
。
如果您有兴趣深入挖掘源代码,可以找到它 here,但这里是对此处进行的计算的简化查看:
ap = np.asarray(sorted(df))
Nx = df.shape[0]
indices = z1 / 100 * (Nx - 1)
indices_below = np.floor(indices).astype(int)
indices_above = indices_below + 1
weight_above = indices - indices_below
weight_below = 1 - weight_above
x1 = ap[b] * weight_below # 57.50000000000004
x2 = ap[a] * weight_above # 12.499999999999956
x1 + x2
70.0
我对自己做错了什么感到困惑。
我有以下代码:
import numpy as np
from scipy import stats
df
Out[29]: array([66., 69., 67., 75., 69., 69.])
val = 73.94
z1 = stats.percentileofscore(df, val)
print(z1)
Out[33]: 83.33333333333334
np.percentile(df, z1)
Out[34]: 69.999999999
我期待 np.percentile(df, z1)
会回馈我 val = 73.94
我认为您不太了解 percentileofscore
和 percentile
的实际作用。它们是而不是彼此的反面。
来自 scipy.stats.percentileofscore
的文档:
The percentile rank of a score relative to a list of scores.
A
percentileofscore
of, for example, 80% means that 80% of the scores in a are below the given score. In the case of gaps or ties, the exact definition depends on the optional keyword, kind.
因此,当您提供值 73.94
时,df
中有 5
个元素低于该分数,并且 5/6
为您提供 83.3333%
结果。
现在在 numpy.percentile
的注释中:
Given a vector V of length N, the q-th percentile of V is the value q/100 of the way from the minimum to the maximum in a sorted copy of V.
默认的interpolation
参数是'linear'
所以:
'linear':
i + (j - i) * fraction
, where fraction is the fractional part of the index surrounded by i and j.
由于您提供了 83
作为输入参数,因此您正在查看数组中从最小值到最大值的值 83/100
。
如果您有兴趣深入挖掘源代码,可以找到它 here,但这里是对此处进行的计算的简化查看:
ap = np.asarray(sorted(df))
Nx = df.shape[0]
indices = z1 / 100 * (Nx - 1)
indices_below = np.floor(indices).astype(int)
indices_above = indices_below + 1
weight_above = indices - indices_below
weight_below = 1 - weight_above
x1 = ap[b] * weight_below # 57.50000000000004
x2 = ap[a] * weight_above # 12.499999999999956
x1 + x2
70.0