pandas 使用哪种方法计算百分位数？

Question

我试图理解 pandas 中的 lower/upper 百分位数计算，但有点困惑。这是它的示例代码和输出。

test = pd.Series([7, 15, 36, 39, 40, 41])
test.describe()

输出：

我只对 25%、75% 的百分位数感兴趣。请问pandas是用什么方法计算的？

那么 statistical/mathematical pandas 使用什么方法来计算百分位数？

Answer 1

如果未提供，它会执行 [series.quantile(x) for x in percentiles]，其中百分位数为 percentiles = np.array([0.25, 0.5, 0.75])。

中看到

Answer 2

正如我在评论中提到的那样，我终于弄清楚了它是如何工作的，方法是尝试 from pandas.core.algorithms import quantile 使用 quantile 函数，正如@Abdou 建议的那样。

我不太好只通过打字来解释它，因此我将只在给定的示例中进行 25% 和 75% 的示例。这是简短的（可能很差）解释：

对于示例列表 [7, 15, 36, 39, 40, 41] 分位数如下：

7 -> 0%

15 -> 20%

36 -> 40%

39 -> 60%

40 -> 80%

41 -> 100%

因为我们要找到25%的百分位数，所以它会在15和36之间，而且是20% + 5% = 15 + (36-15)/4 = 15 + 5.25 = 20.25。

使用

(36-15)/4，因为15和36的距离是40% - 20% = 20%，所以我们除以4得到5%。

同样的方法我们可以找到 75%。

60% + 15% = 39 + 3*(40-39)/4 = 39.75

就是这样。实在抱歉解释不当

注意：感谢@shin 评论中提到的更正。

Which method does pandas use for percentile?