找到每个的平均值
Find average of each
我有一个如下所示的数据框:
id name industry income
1 apple telecommunication 100
2 oil gas 100
3 samsung telecommunication 200
4 coinbase crypto 100
5 microsoft telecommunication 30
所以我想做的是求出每个行业的平均收入。
它将是:电信 110,天然气 100,加密货币 100。
我所做的是找到每个行业的频率:
df.groupby(['industry']).sum().value_counts('industry')
这导致:
industry
telecommunication 3
gas 1
crypto 1
而且我还查到了每个行业的收入总和:
df.groupby(['industry']).sum()['income']
这导致
industry
telecommunication 330
gas 100
crypto 100
现在我对如何继续感到困惑...
您正在寻找 mean
:
means = df.groupby('industry')['income'].mean()
输出:
>>> means
industry
crypto 100.0
gas 100.0
telecommunication 110.0
Name: income, dtype: float64
>>> means['telecommunication']
110.0
如果您想保留所有其他详细信息,groupby 和 transform
df['mean']=df.groupby('industry')['income'].transform('mean')
id name industry income mean
0 1 apple telecommunication 100 110.0
1 2 oil gas 100 100.0
2 3 samsung telecommunication 200 110.0
3 4 coinbase crypto 100 100.0
4 5 microsoft telecommunication 30 110.0
如果你需要一个总结框架
df.groupby('industry')['income'].mean().to_frame('mean_income')
mean_income
industry
crypto 100.0
gas 100.0
telecommunication 110.0
也许你应该使用agg
来避免多次操作:
out = df.groupby('industry', sort=False).agg(size=('income', 'size'),
mean=('income', 'mean'),
sum=('income', 'sum')).reset_index()
print(out)
# Output:
industry size mean sum
0 telecommunication 3 110.0 330
1 gas 1 100.0 100
2 crypto 1 100.0 100
我有一个如下所示的数据框:
id name industry income
1 apple telecommunication 100
2 oil gas 100
3 samsung telecommunication 200
4 coinbase crypto 100
5 microsoft telecommunication 30
所以我想做的是求出每个行业的平均收入。 它将是:电信 110,天然气 100,加密货币 100。
我所做的是找到每个行业的频率:
df.groupby(['industry']).sum().value_counts('industry')
这导致:
industry
telecommunication 3
gas 1
crypto 1
而且我还查到了每个行业的收入总和:
df.groupby(['industry']).sum()['income']
这导致
industry
telecommunication 330
gas 100
crypto 100
现在我对如何继续感到困惑...
您正在寻找 mean
:
means = df.groupby('industry')['income'].mean()
输出:
>>> means
industry
crypto 100.0
gas 100.0
telecommunication 110.0
Name: income, dtype: float64
>>> means['telecommunication']
110.0
如果您想保留所有其他详细信息,groupby 和 transform
df['mean']=df.groupby('industry')['income'].transform('mean')
id name industry income mean
0 1 apple telecommunication 100 110.0
1 2 oil gas 100 100.0
2 3 samsung telecommunication 200 110.0
3 4 coinbase crypto 100 100.0
4 5 microsoft telecommunication 30 110.0
如果你需要一个总结框架
df.groupby('industry')['income'].mean().to_frame('mean_income')
mean_income
industry
crypto 100.0
gas 100.0
telecommunication 110.0
也许你应该使用agg
来避免多次操作:
out = df.groupby('industry', sort=False).agg(size=('income', 'size'),
mean=('income', 'mean'),
sum=('income', 'sum')).reset_index()
print(out)
# Output:
industry size mean sum
0 telecommunication 3 110.0 330
1 gas 1 100.0 100
2 crypto 1 100.0 100