计算 Python Pandas 库中的百分比

Calculating percentage in Python Pandas library

我有一个这样的 Pandas 数据框:

import pandas as pd
df = pd.DataFrame(
    {'gender':['F','F','F','F','F','M','M','M','M','M'],
     'mature':[0,1,0,0,0,1,1,1,0,1],
     'cta'   :[1,1,0,1,0,0,0,1,0,1]}
)

df['gender'] = df['gender'].astype('category')
df['mature'] = df['mature'].astype('category')

df['cta']    = pd.to_numeric(df['cta'])
df

我计算了总和(人们点击了多少次)和总计(发送消息的数量)。我想弄清楚如何计算定义为 clicks/total 的百分比以及如何将数据帧作为输出。

temp_groupby = df.groupby('gender').agg({'cta': [('clicks','sum'),
                                  ('total','count')]})
temp_groupby

我认为这意味着您需要平均值,将新元组添加到列表中,例如:

temp_groupby = df.groupby('gender').agg({'cta': [('clicks','sum'),
                                                 ('total','count'),
                                                 ('perc', 'mean')]})
print (temp_groupby)
          cta           
       clicks total perc
gender                  
F           3     5  0.6
M           2     5  0.4

为了避免MultiIndex in columnsgroupby之后指定列:

temp_groupby = df.groupby('gender')['cta'].agg([('clicks','sum'),
                                                ('total','count'),
                                                ('perc', 'mean')]).reset_index()
print (temp_groupby)
  gender  clicks  total  perc
0      F       3      5   0.6
1      M       2      5   0.4

或使用命名聚合:

temp_groupby = df.groupby('gender', as_index=False).agg(clicks= ('cta','sum'),
                                                        total= ('cta','count'),
                                                        perc= ('cta','mean'))
print (temp_groupby)
  gender  clicks  total  perc
0      F       3      5   0.6
1      M       2      5   0.4