如何根据 python 中列的标签计算均值和中位数

Question

我有一个大数据框，显示如下：

price   type      status
2       shoes      none
3       clothes    none
6       clothes    none
3       shoes      none
4       shoes      none
6       shoes      none
2       clothes    none
3       shoes      none
6       clothes    none
8       clothes    done

基本上，每当写完“状态”时，我想根据“类型”计算均值和中位数。到目前为止，我所做的是首先根据状态“完成”创建一个组，然后我计算组的平均值和中位数，如下面的脚本：

g = df['status'].eq('done').iloc[::-1].cumsum().iloc[::-1]
grouper = df.groupby(g)
df_statistics = grouper.agg(
               mean = ('price', 'mean')
              ,median = ('price', 'median')
)
df_freq = df.groupby(g).apply(lambda x: x['price'].value_counts().idxmax())

如何为“类型”再添加一个参数，这样脚本也会根据“类型”估计每组的中位数。

谢谢

Answer 1

我认为您需要将列名传递给列表，然后传递给 groupby:

grouper = df.groupby([g, 'type'])

如何根据 python 中列的标签计算均值和中位数

how to calculate mean and median based on label of a column in python

python

statistics

mean

median

pandas