如何制作每年三 类 百分比的堆叠条形图?

How to make a stacked barplot for percentage of three classes per year?

我需要使用这个数据集(head)制作一个堆叠条形图:

data = {'model': ['A1', 'A6', 'A1', 'A4', 'A3'],
        'year': [2017, 2016, 2016, 2017, 2019],
        'price': [12500, 16500, 11000, 16800, 17300],
        'transmission': ['Manual', 'Automatic', 'Manual', 'Automatic', 'Manual'],
        'mileage': [15735, 36203, 29946, 25952, 1998],
        'fuelType': ['Petrol', 'Diesel', 'Petrol', 'Diesel', 'Petrol'],
        'tax': [150, 20, 30, 145, 145],
        'mpg': [55.4, 64.2, 55.4, 67.3, 49.6],
        'engineSize': [1.4, 2.0, 1.4, 2.0, 1.0]}

df = pd.DataFrame(data)

  model  year  price transmission  mileage fuelType  tax   mpg  engineSize
0    A1  2017  12500       Manual    15735   Petrol  150  55.4         1.4
1    A6  2016  16500    Automatic    36203   Diesel   20  64.2         2.0
2    A1  2016  11000       Manual    29946   Petrol   30  55.4         1.4
3    A4  2017  16800    Automatic    25952   Diesel  145  67.3         2.0
4    A3  2019  17300       Manual     1998   Petrol  145  49.6         1.0

我想要 x 轴上的年份 (1997-2021) 和 y 轴上 0 到 100 之间的数字代表百分比。最后,我希望按年度比例显示三种不同的燃料类型;汽油、柴油和混合动力。

我已经完成了以下计算以获得每年每种燃料类型的百分比,现在我需要将其放在图表上:

fuel_percentage = round((my_data_frame.groupby(['year'])['fuelType'].value_counts()/my_data_frame.groupby('year')['fuelType'].count())*100, 2)

print(fuel_percentage)

这给了我以下结果:

year  fuelType
1997  Petrol      100.00
1998  Petrol      100.00
2002  Petrol      100.00
2003  Diesel       66.67
      Petrol       33.33
2004  Petrol       80.00
      Diesel       20.00
2005  Petrol       71.43
      Diesel       28.57
2006  Petrol       66.67
      Diesel       33.33
2007  Petrol       56.25
      Diesel       43.75
2008  Diesel       66.67
      Petrol       33.33
etc...

我主要担心的是,由于该对象不是数据框,我将无法使用它来制作绘图。

这是我想要的那种情节的示例(用燃料类型替换玩家,用百分比替换 y 轴):

感谢您的帮助!

... 编辑 ...

  • 测试于 python 3.8.11pandas 1.3.3matplotlib 3.4.3

.groupby & .unstack

import pandas as pd

# I'm not a fan of this option because it requires doing .groupby twice
# calculate percent with groupby
dfc = (df.groupby(['year'])['fuelType'].value_counts() / df.groupby('year')['fuelType'].count()).mul(100).round(1)

# unstack the long dataframe
dfc = dfc.unstack(level=1)
  • .groupby.value_counts.unstack
dfc = df.groupby(['year'])['fuelType'].value_counts(normalize=True).mul(100).round(1).unstack(level=1)

.crosstab

# get the normalized value counts by index
dfc = pd.crosstab(df.year, df.fuelType, normalize='index').mul(100).round(1)

情节

# display(dfc)
fuelType  Diesel  Petrol
year                    
2016        50.0    50.0
2017        50.0    50.0
2019         0.0   100.0

# plot bar
ax = dfc.plot(kind='bar', ylabel='Percent(%)', stacked=True, rot=0, figsize=(10, 4))

  • 删除 xticks=dfc.index 以使绘图 API 在 x 轴上具有更多值。
# plot area
ax = dfc.plot(kind='area', ylabel='Percent(%)', rot=0, figsize=(10, 4), xticks=dfc.index)