如何在按另一个变量分组后计算每个 class 变量的百分比？

Question

我有一个最初看起来像这样的数据：

column1	Time
yes	271-273
no	271-273
neutral	271-273
no	274-276
...	...

我想要每个时间范围内的“是”百分比、“否”百分比和“中立”百分比。我能够使用以下代码在每个时间范围内获得这些类别（是，否，中性）中的每一个的计数：

df['COUNTER'] =1    
group_data = df.groupby(['Time','column1'])['COUNTER'].sum()

我不确定如何从中计算计数百分比。

Answer 1

使用 SeriesGroupBy.value_counts 和参数 normalize=True:

print (df)
   column1     Time
0      yes  271-273
1       no  271-273
2  neutral  271-273
3       no  271-273 <- changed data for better sample

group_data = (df.groupby(['Time'])['column1']
                .value_counts(normalize=True)
                .reset_index(name='%') )

print (group_data)
      Time  column1     %
0  271-273       no  0.50
1  271-273  neutral  0.25
2  271-273      yes  0.25

另一个想法是用DataFrameGroupBy.size and Series.div除以计数总和：

s = df.groupby(['Time','column1']).size()
group_data = s.div(s.groupby(level=0).sum()).reset_index(name='%') 
print (group_data)

      Time  column1     %
0  271-273  neutral  0.25
1  271-273       no  0.50
2  271-273      yes  0.25

如何在按另一个变量分组后计算每个 class 变量的百分比？

How to calculate percentage of each class variable after grouping it by another variable?

python

counter

group-by

sum

pandas