分组值计数条形图的子图
Subplot for grouped value count bar plot
我的 table 看起来像下面的东西
YEAR RESPONSIBLE DISTRICT
2014 01 - PARIS
2014 01 - PARIS
2014 01 - PARIS
2014 01 - PARIS
2014 01 - PARIS
... ... ...
2017 15 - SAN ANTONIO
2017 15 - SAN ANTONIO
2017 15 - SAN ANTONIO
2017 15 - SAN ANTONIO
2017 15 - SAN ANTONIO
我写完后
g = df.groupby('FISCAL YEAR')['RESPONSIBLE DISTRICT'].value_counts()
我低于
YEAR RESPONSIBLE DISTRICT
2014 05 - LUBBOCK 12312
15 - SAN ANTONIO 10457
18 - DALLAS 9885
04 - AMARILLO 9617
08 - ABILENE 8730
...
2020 21 - PHARR 5645
25 - CHILDRESS 5625
20 - BEAUMONT 5560
22 - LAREDO 5034
24 - EL PASO 4620
我总共有25个区。现在我想创建 25 个子图,因此每个子图代表一个地区。对于每个子图,我希望 2014-2020 年位于 x 轴上,值计数位于 y 轴上。我该怎么做?
是否如你所愿?
import matplotlib.pyplot as plt
fig, axs = plt.subplots(5, 5, sharex=True, sharey=True, figsize=(15, 15))
for ax, (district, sr) in zip(axs.flat, g.groupby('RESPONSIBLE DISTRICT')):
ax.set_title(district)
ax.plot(sr.index.get_level_values('YEAR'), sr.values)
fig.tight_layout()
plt.show()
这应该有效。
import matplotlib.pyplot as plt
import pandas as pd
g = df.groupby('YEAR')['RESPONSIBLE DISTRICT'].value_counts()
fig, axs = plt.subplots(5, 5, constrained_layout=True)
for ax, (district, dfi) in zip(axs.ravel(), g.groupby('RESPONSIBLE DISTRICT')):
x = dfi.index.get_level_values('YEAR').values
y = dfi.values
ax.bar(x, y)
ax.set_title(district)
plt.show()
- 仅使用
pandas
的正确方法是使用 .pivot
, and then to correctly use pandas.DataFrame.plot
调整数据框。
导入和数据
import pandas as pd
import numpy as np # for test data
import seaborn as sns # only for seaborn option
# test data
np.random.seed(365)
rows = 100000
data = {'YEAR': np.random.choice(range(2014, 2021), size=rows),
'RESPONSIBLE DISTRICT': np.random.choice(['05 - LUBBOCK', '15 - SAN ANTONIO', '18 - DALLAS', '04 - AMARILLO', '08 - ABILENE', '21 - PHARR', '25 - CHILDRESS', '20 - BEAUMONT', '22 - LAREDO', '24 - EL PASO'], size=rows)}
df = pd.DataFrame(data)
# get the value count of each district by year and pivot the shape
dfp = df.value_counts(subset=['YEAR', 'RESPONSIBLE DISTRICT']).reset_index(name='VC').pivot(index='YEAR', columns='RESPONSIBLE DISTRICT', values='VC')
# display(dfp)
RESPONSIBLE DISTRICT 04 - AMARILLO 05 - LUBBOCK 08 - ABILENE 15 - SAN ANTONIO 18 - DALLAS 20 - BEAUMONT 21 - PHARR 22 - LAREDO 24 - EL PASO 25 - CHILDRESS
YEAR
2014 1407 1406 1485 1456 1392 1456 1499 1458 1394 1452
2015 1436 1423 1428 1441 1395 1400 1423 1442 1375 1399
2016 1480 1381 1393 1415 1446 1442 1414 1435 1452 1454
2017 1422 1388 1485 1447 1404 1401 1413 1470 1424 1426
2018 1479 1424 1384 1450 1390 1384 1445 1435 1478 1386
2019 1387 1317 1379 1457 1457 1476 1447 1459 1451 1406
2020 1462 1452 1454 1448 1441 1428 1411 1407 1402 1445
pandas.DataFrame.plot
- 如果首选线图,请使用
kind='line'
。
# plot the dataframe
fig = dfp.plot(kind='bar', subplots=True, layout=(5, 5), figsize=(20, 20), legend=False)
seaborn.catplot
seaborn
是 matplotlib
的高级 API
- 这是最简单的方法,因为数据框不需要重新整形。
p = sns.catplot(kind='count', data=df, col='RESPONSIBLE DISTRICT', col_wrap=5, x='YEAR', height=3.5, )
p.set_titles(row_template='{row_name}', col_template='{col_name}') # shortens the titles
我的 table 看起来像下面的东西
YEAR RESPONSIBLE DISTRICT
2014 01 - PARIS
2014 01 - PARIS
2014 01 - PARIS
2014 01 - PARIS
2014 01 - PARIS
... ... ...
2017 15 - SAN ANTONIO
2017 15 - SAN ANTONIO
2017 15 - SAN ANTONIO
2017 15 - SAN ANTONIO
2017 15 - SAN ANTONIO
我写完后
g = df.groupby('FISCAL YEAR')['RESPONSIBLE DISTRICT'].value_counts()
我低于
YEAR RESPONSIBLE DISTRICT
2014 05 - LUBBOCK 12312
15 - SAN ANTONIO 10457
18 - DALLAS 9885
04 - AMARILLO 9617
08 - ABILENE 8730
...
2020 21 - PHARR 5645
25 - CHILDRESS 5625
20 - BEAUMONT 5560
22 - LAREDO 5034
24 - EL PASO 4620
我总共有25个区。现在我想创建 25 个子图,因此每个子图代表一个地区。对于每个子图,我希望 2014-2020 年位于 x 轴上,值计数位于 y 轴上。我该怎么做?
是否如你所愿?
import matplotlib.pyplot as plt
fig, axs = plt.subplots(5, 5, sharex=True, sharey=True, figsize=(15, 15))
for ax, (district, sr) in zip(axs.flat, g.groupby('RESPONSIBLE DISTRICT')):
ax.set_title(district)
ax.plot(sr.index.get_level_values('YEAR'), sr.values)
fig.tight_layout()
plt.show()
这应该有效。
import matplotlib.pyplot as plt
import pandas as pd
g = df.groupby('YEAR')['RESPONSIBLE DISTRICT'].value_counts()
fig, axs = plt.subplots(5, 5, constrained_layout=True)
for ax, (district, dfi) in zip(axs.ravel(), g.groupby('RESPONSIBLE DISTRICT')):
x = dfi.index.get_level_values('YEAR').values
y = dfi.values
ax.bar(x, y)
ax.set_title(district)
plt.show()
- 仅使用
pandas
的正确方法是使用.pivot
, and then to correctly usepandas.DataFrame.plot
调整数据框。
导入和数据
import pandas as pd
import numpy as np # for test data
import seaborn as sns # only for seaborn option
# test data
np.random.seed(365)
rows = 100000
data = {'YEAR': np.random.choice(range(2014, 2021), size=rows),
'RESPONSIBLE DISTRICT': np.random.choice(['05 - LUBBOCK', '15 - SAN ANTONIO', '18 - DALLAS', '04 - AMARILLO', '08 - ABILENE', '21 - PHARR', '25 - CHILDRESS', '20 - BEAUMONT', '22 - LAREDO', '24 - EL PASO'], size=rows)}
df = pd.DataFrame(data)
# get the value count of each district by year and pivot the shape
dfp = df.value_counts(subset=['YEAR', 'RESPONSIBLE DISTRICT']).reset_index(name='VC').pivot(index='YEAR', columns='RESPONSIBLE DISTRICT', values='VC')
# display(dfp)
RESPONSIBLE DISTRICT 04 - AMARILLO 05 - LUBBOCK 08 - ABILENE 15 - SAN ANTONIO 18 - DALLAS 20 - BEAUMONT 21 - PHARR 22 - LAREDO 24 - EL PASO 25 - CHILDRESS
YEAR
2014 1407 1406 1485 1456 1392 1456 1499 1458 1394 1452
2015 1436 1423 1428 1441 1395 1400 1423 1442 1375 1399
2016 1480 1381 1393 1415 1446 1442 1414 1435 1452 1454
2017 1422 1388 1485 1447 1404 1401 1413 1470 1424 1426
2018 1479 1424 1384 1450 1390 1384 1445 1435 1478 1386
2019 1387 1317 1379 1457 1457 1476 1447 1459 1451 1406
2020 1462 1452 1454 1448 1441 1428 1411 1407 1402 1445
pandas.DataFrame.plot
- 如果首选线图,请使用
kind='line'
。
# plot the dataframe
fig = dfp.plot(kind='bar', subplots=True, layout=(5, 5), figsize=(20, 20), legend=False)
seaborn.catplot
seaborn
是matplotlib
的高级 API
- 这是最简单的方法,因为数据框不需要重新整形。
p = sns.catplot(kind='count', data=df, col='RESPONSIBLE DISTRICT', col_wrap=5, x='YEAR', height=3.5, )
p.set_titles(row_template='{row_name}', col_template='{col_name}') # shortens the titles