计算 Python Pandas 库中的百分比
Calculating percentage in Python Pandas library
我有一个这样的 Pandas 数据框:
import pandas as pd
df = pd.DataFrame(
{'gender':['F','F','F','F','F','M','M','M','M','M'],
'mature':[0,1,0,0,0,1,1,1,0,1],
'cta' :[1,1,0,1,0,0,0,1,0,1]}
)
df['gender'] = df['gender'].astype('category')
df['mature'] = df['mature'].astype('category')
df['cta'] = pd.to_numeric(df['cta'])
df
我计算了总和(人们点击了多少次)和总计(发送消息的数量)。我想弄清楚如何计算定义为 clicks/total 的百分比以及如何将数据帧作为输出。
temp_groupby = df.groupby('gender').agg({'cta': [('clicks','sum'),
('total','count')]})
temp_groupby
我认为这意味着您需要平均值,将新元组添加到列表中,例如:
temp_groupby = df.groupby('gender').agg({'cta': [('clicks','sum'),
('total','count'),
('perc', 'mean')]})
print (temp_groupby)
cta
clicks total perc
gender
F 3 5 0.6
M 2 5 0.4
为了避免MultiIndex in columns
在groupby
之后指定列:
temp_groupby = df.groupby('gender')['cta'].agg([('clicks','sum'),
('total','count'),
('perc', 'mean')]).reset_index()
print (temp_groupby)
gender clicks total perc
0 F 3 5 0.6
1 M 2 5 0.4
或使用命名聚合:
temp_groupby = df.groupby('gender', as_index=False).agg(clicks= ('cta','sum'),
total= ('cta','count'),
perc= ('cta','mean'))
print (temp_groupby)
gender clicks total perc
0 F 3 5 0.6
1 M 2 5 0.4
我有一个这样的 Pandas 数据框:
import pandas as pd
df = pd.DataFrame(
{'gender':['F','F','F','F','F','M','M','M','M','M'],
'mature':[0,1,0,0,0,1,1,1,0,1],
'cta' :[1,1,0,1,0,0,0,1,0,1]}
)
df['gender'] = df['gender'].astype('category')
df['mature'] = df['mature'].astype('category')
df['cta'] = pd.to_numeric(df['cta'])
df
我计算了总和(人们点击了多少次)和总计(发送消息的数量)。我想弄清楚如何计算定义为 clicks/total 的百分比以及如何将数据帧作为输出。
temp_groupby = df.groupby('gender').agg({'cta': [('clicks','sum'),
('total','count')]})
temp_groupby
我认为这意味着您需要平均值,将新元组添加到列表中,例如:
temp_groupby = df.groupby('gender').agg({'cta': [('clicks','sum'),
('total','count'),
('perc', 'mean')]})
print (temp_groupby)
cta
clicks total perc
gender
F 3 5 0.6
M 2 5 0.4
为了避免MultiIndex in columns
在groupby
之后指定列:
temp_groupby = df.groupby('gender')['cta'].agg([('clicks','sum'),
('total','count'),
('perc', 'mean')]).reset_index()
print (temp_groupby)
gender clicks total perc
0 F 3 5 0.6
1 M 2 5 0.4
或使用命名聚合:
temp_groupby = df.groupby('gender', as_index=False).agg(clicks= ('cta','sum'),
total= ('cta','count'),
perc= ('cta','mean'))
print (temp_groupby)
gender clicks total perc
0 F 3 5 0.6
1 M 2 5 0.4