从分组数据框中获取百分位数

Question

我有一个包含 2 个实验组的数据框，我正在尝试获取百分位数分布。但是，数据已经分组：

df = pd.DataFrame({'group': ['control', 'control', 'control','treatment','treatment','treatment'],
               'month': [1,4,9,2,5,12],
               'ct': [8,4,2,5,5,7]})

我想计算哪个月代表每个组的第 25、50、75 个百分位数，但数据框已经根据 group/month 个变量分组。

更新 1：我意识到我没有澄清我运行遇到的麻烦。这是一个分组数据框，因此控制，例如，有 8 个数据点，其中月份 = 1，4 个数据点，其中月份 = 4，以及 2 个数据点，其中月份 = 9。以下百分位值应为：

x = pd.Series([1,1,1,1,1,1,1,1,4,4,4,4,9,9)]
x.quantile([0.25,0.5,0.75])
>> 0.25    1.0
   0.50    1.0
   0.75    4.0
   dtype: float64

按组分组并取分位数并不能提供准确的答案。有没有办法分解计数并取未分组值的百分位数？最终对象应具有这些值：

             p25 p50 p75
control      1   1   4
treatment    2   5   12

Answer 1

您可以尝试使用 pd.quanitle 和所需的百分比作为列表

df.groupby('group').quantile([0.25,0.50,0.75])

输出：

                    ct  month
group           
control     0.25    3.0 2.5
            0.50    4.0 4.0
            0.75    6.0 6.5
treatment   0.25    5.0 3.5
            0.50    5.0 5.0
            0.75    6.0 8.5

Answer 2

你可以使用Series.repeat然后得到分位数：

df.groupby('group').apply(lambda x: (x.month.repeat(x.ct)).quantile([0.25, 0.5, 0.75])).rename_axis([None], axis=1)

           0.25  0.50  0.75
group                      
control     1.0   1.0   4.0
treatment   2.0   5.0  12.0

Answer 3

您可能需要查看 describe

df.groupby('group').describe().stack()

从分组数据框中获取百分位数

Get percentiles from a grouped dataframe

python

percentile

dataframe

pandas