Groupby 和平均值 - Python
Groupby and average - Python
当前有一个数据框是:
price type randomc1 randomc2 randomc3
2 Dumpling
1 Milk Based Drinks
2 Dumpling
3 Milk Based Drinks
7 Cold Cuts
5 Cold Cuts
想求同类商品的平均价格
想要的输出:
type average
Dumpling 2
Milk Based Drinks 2
Cold Cuts 6
此外,还有大约 100 种不同的“类型”。因此,理想情况下希望打印每个“类型”。
如有任何帮助,我们将不胜感激。
edit: output to print(df.to_dict())
{'Dish_Type': ['Dumpling',
'Dumpling',
'Milk Based Drinks',
'Milk Based Drinks',
'Dumpling'],
'Dish_Price': ['.95', '.95', '.95', '.95', '.95']}
除非我没理解错,否则你有多少个不同的 type
并不重要,在使用 groupby()
时会考虑所有这些。你试过了吗:
df.groupby('type',as_index=False).agg(average=pd.NamedAgg('price','mean'))
您可以使用:
out = (df.assign(Dish_Price=df['Dish_Price'].str.lstrip('$').astype(float))
.groupby('Dish_Type', as_index=False)
.agg(Dish_Average=('Dish_Price', 'mean')))
print(out)
# Output
Dish_Type Dish_Average
0 Dumpling 9.283333
1 Milk Based Drinks 8.950000
设置:
data = {'Dish_Type': ['Dumpling', 'Dumpling', 'Milk Based Drinks',
'Milk Based Drinks', 'Dumpling'],
'Dish_Price': ['.95', '.95', '.95', '.95', '.95']}
df = pd.DataFrame(data)
当前有一个数据框是:
price type randomc1 randomc2 randomc3
2 Dumpling
1 Milk Based Drinks
2 Dumpling
3 Milk Based Drinks
7 Cold Cuts
5 Cold Cuts
想求同类商品的平均价格
想要的输出:
type average
Dumpling 2
Milk Based Drinks 2
Cold Cuts 6
此外,还有大约 100 种不同的“类型”。因此,理想情况下希望打印每个“类型”。
如有任何帮助,我们将不胜感激。
edit: output to print(df.to_dict())
{'Dish_Type': ['Dumpling',
'Dumpling',
'Milk Based Drinks',
'Milk Based Drinks',
'Dumpling'],
'Dish_Price': ['.95', '.95', '.95', '.95', '.95']}
除非我没理解错,否则你有多少个不同的 type
并不重要,在使用 groupby()
时会考虑所有这些。你试过了吗:
df.groupby('type',as_index=False).agg(average=pd.NamedAgg('price','mean'))
您可以使用:
out = (df.assign(Dish_Price=df['Dish_Price'].str.lstrip('$').astype(float))
.groupby('Dish_Type', as_index=False)
.agg(Dish_Average=('Dish_Price', 'mean')))
print(out)
# Output
Dish_Type Dish_Average
0 Dumpling 9.283333
1 Milk Based Drinks 8.950000
设置:
data = {'Dish_Type': ['Dumpling', 'Dumpling', 'Milk Based Drinks',
'Milk Based Drinks', 'Dumpling'],
'Dish_Price': ['.95', '.95', '.95', '.95', '.95']}
df = pd.DataFrame(data)