分组值计数条形图的子图

Question

我的 table 看起来像下面的东西

YEAR    RESPONSIBLE DISTRICT
2014    01 - PARIS
2014    01 - PARIS
2014    01 - PARIS
2014    01 - PARIS
2014    01 - PARIS
... ... ...
2017    15 - SAN ANTONIO
2017    15 - SAN ANTONIO
2017    15 - SAN ANTONIO
2017    15 - SAN ANTONIO
2017    15 - SAN ANTONIO

我写完后

g = df.groupby('FISCAL YEAR')['RESPONSIBLE DISTRICT'].value_counts()

我低于

YEAR         RESPONSIBLE DISTRICT
2014         05 - LUBBOCK            12312
             15 - SAN ANTONIO        10457
             18 - DALLAS              9885
             04 - AMARILLO            9617
             08 - ABILENE             8730
                                     ...  
2020         21 - PHARR               5645
             25 - CHILDRESS           5625
             20 - BEAUMONT            5560
             22 - LAREDO              5034
             24 - EL PASO             4620

我总共有25个区。现在我想创建 25 个子图，因此每个子图代表一个地区。对于每个子图，我希望 2014-2020 年位于 x 轴上，值计数位于 y 轴上。我该怎么做？

Answer 1

是否如你所愿？

import matplotlib.pyplot as plt

fig, axs = plt.subplots(5, 5, sharex=True, sharey=True, figsize=(15, 15))
for ax, (district, sr) in zip(axs.flat, g.groupby('RESPONSIBLE DISTRICT')):
    ax.set_title(district)
    ax.plot(sr.index.get_level_values('YEAR'), sr.values)
fig.tight_layout()

plt.show()

Answer 2

这应该有效。

import matplotlib.pyplot as plt
import pandas as pd


g = df.groupby('YEAR')['RESPONSIBLE DISTRICT'].value_counts()


fig, axs = plt.subplots(5, 5, constrained_layout=True)

for ax, (district, dfi) in zip(axs.ravel(), g.groupby('RESPONSIBLE DISTRICT')):
    x = dfi.index.get_level_values('YEAR').values
    y = dfi.values
    ax.bar(x, y)
    ax.set_title(district)

plt.show()

Answer 3

仅使用 pandas 的正确方法是使用 .pivot, and then to correctly use pandas.DataFrame.plot 调整数据框。

导入和数据

import pandas as pd
import numpy as np  # for test data
import seaborn as sns  # only for seaborn option

# test data
np.random.seed(365)
rows = 100000
data = {'YEAR': np.random.choice(range(2014, 2021), size=rows),
        'RESPONSIBLE DISTRICT': np.random.choice(['05 - LUBBOCK', '15 - SAN ANTONIO', '18 - DALLAS', '04 - AMARILLO', '08 - ABILENE', '21 - PHARR', '25 - CHILDRESS', '20 - BEAUMONT', '22 - LAREDO', '24 - EL PASO'], size=rows)}
df = pd.DataFrame(data)

# get the value count of each district by year and pivot the shape
dfp = df.value_counts(subset=['YEAR', 'RESPONSIBLE DISTRICT']).reset_index(name='VC').pivot(index='YEAR', columns='RESPONSIBLE DISTRICT', values='VC')

# display(dfp)
RESPONSIBLE DISTRICT  04 - AMARILLO  05 - LUBBOCK  08 - ABILENE  15 - SAN ANTONIO  18 - DALLAS  20 - BEAUMONT  21 - PHARR  22 - LAREDO  24 - EL PASO  25 - CHILDRESS
YEAR                                                                                                                                                                
2014                           1407          1406          1485              1456         1392           1456        1499         1458          1394            1452
2015                           1436          1423          1428              1441         1395           1400        1423         1442          1375            1399
2016                           1480          1381          1393              1415         1446           1442        1414         1435          1452            1454
2017                           1422          1388          1485              1447         1404           1401        1413         1470          1424            1426
2018                           1479          1424          1384              1450         1390           1384        1445         1435          1478            1386
2019                           1387          1317          1379              1457         1457           1476        1447         1459          1451            1406
2020                           1462          1452          1454              1448         1441           1428        1411         1407          1402            1445

`pandas.DataFrame.plot`

如果首选线图，请使用 kind='line'。

# plot the dataframe
fig = dfp.plot(kind='bar', subplots=True, layout=(5, 5), figsize=(20, 20), legend=False)

`seaborn.catplot`

seaborn 是 matplotlib
这是最简单的方法，因为数据框不需要重新整形。

p = sns.catplot(kind='count', data=df, col='RESPONSIBLE DISTRICT', col_wrap=5, x='YEAR', height=3.5, )
p.set_titles(row_template='{row_name}', col_template='{col_name}')  # shortens the titles

分组值计数条形图的子图

Subplot for grouped value count bar plot

python

matplotlib

bar-chart

pandas

subplot

导入和数据

`pandas.DataFrame.plot`

`seaborn.catplot`