找到每个的平均值

Find average of each

我有一个如下所示的数据框:

id    name         industry               income
1     apple       telecommunication         100     
2     oil           gas                     100
3    samsung      telecommunication         200
4   coinbase       crypto                   100
5   microsoft    telecommunication          30

所以我想做的是求出每个行业的平均收入。 它将是:电信 110,天然气 100,加密货币 100。

我所做的是找到每个行业的频率:

df.groupby(['industry']).sum().value_counts('industry')

这导致:

industry
telecommunication       3
gas                     1
crypto                  1

而且我还查到了每个行业的收入总和:

df.groupby(['industry']).sum()['income']

这导致

industry
telecommunication       330
gas                     100
crypto                  100

现在我对如何继续感到困惑...

您正在寻找 mean:

means = df.groupby('industry')['income'].mean()

输出:

>>> means
industry
crypto               100.0
gas                  100.0
telecommunication    110.0
Name: income, dtype: float64

>>> means['telecommunication']
110.0

如果您想保留所有其他详细信息,groupby 和 transform

df['mean']=df.groupby('industry')['income'].transform('mean')



  id       name           industry  income   mean
0   1      apple  telecommunication     100  110.0
1   2        oil                gas     100  100.0
2   3    samsung  telecommunication     200  110.0
3   4   coinbase             crypto     100  100.0
4   5  microsoft  telecommunication      30  110.0

如果你需要一个总结框架

df.groupby('industry')['income'].mean().to_frame('mean_income')

   

                     mean_income
industry                      
crypto                   100.0
gas                      100.0
telecommunication        110.0

也许你应该使用agg来避免多次操作:

out = df.groupby('industry', sort=False).agg(size=('income', 'size'), 
                                             mean=('income', 'mean'), 
                                             sum=('income', 'sum')).reset_index()
print(out)

# Output:
            industry  size   mean  sum
0  telecommunication     3  110.0  330
1                gas     1  100.0  100
2             crypto     1  100.0  100