如何创建按月分组的年度条形图
How to create a yearly bar plot grouped by months
我在尝试创建条形图时遇到了困难,DataFrame
按年份和月份分组。使用以下代码,我试图在创建的图像中绘制数据,而不是返回第二张图像。我还尝试将图例向右移动并将其值更改为相应的月份。
我开始对使用 groupby
命令获得的 DataFrame 有了一些感觉,虽然没有得到我期望的结果让我问你们。
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
df = pd.read_csv('fcc-forum-pageviews.csv', index_col='date')
line_plot = df.value[(df.value > df.value.quantile(0.025)) & (df.value < df.value.quantile(0.975))]
fig, ax = plt.subplots(figsize=(10,10))
bar_plot = line_plot.groupby([line_plot.index.year, line_plot.index.month]).mean().unstack()
bar_plot.plot(kind='bar')
ax.set_xlabel('Years')
ax.set_ylabel('Average Page Views')
plt.show()
这是我正在分析的数据格式。
date,value
2016-05-09,1201
2016-05-10,2329
2016-05-11,1716
2016-05-12,10539
2016-05-13,6933
只需将您定义的 ax
传递给 pandas
:
bar_plot.plot(ax = ax, kind='bar')
如果您还想用名称替换月份数字,则必须获取这些标签,用名称替换数字并通过将新标签传递给它来重新定义图例:
handles, labels = ax.get_legend_handles_labels()
new_labels = [datetime.date(1900, int(monthinteger), 1).strftime('%B') for monthinteger in labels]
ax.legend(handles = handles, labels = new_labels, loc = 'upper left', bbox_to_anchor = (1.02, 1))
完整代码
import pandas as pd
from matplotlib import pyplot as plt
import datetime
df = pd.read_csv('fcc-forum-pageviews.csv')
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')
line_plot = df.value[(df.value > df.value.quantile(0.025)) & (df.value < df.value.quantile(0.975))]
fig, ax = plt.subplots(figsize=(10,10))
bar_plot = line_plot.groupby([line_plot.index.year, line_plot.index.month]).mean().unstack()
bar_plot.plot(ax = ax, kind='bar')
ax.set_xlabel('Years')
ax.set_ylabel('Average Page Views')
handles, labels = ax.get_legend_handles_labels()
new_labels = [datetime.date(1900, int(monthinteger), 1).strftime('%B') for monthinteger in labels]
ax.legend(handles = handles, labels = new_labels, loc = 'upper left', bbox_to_anchor = (1.02, 1))
plt.show()
(用假数据生成的图)
- 添加
pd.Categorical
的排序分类 'month'
列
- 使用
pd.pivot_table
将数据帧转换为宽格式,其中 aggfunc='mean'
是默认值。
- 宽格式通常最适合绘制分组条形图。
pandas.DataFrame.plot
returns matplotlib.axes.Axes
,所以没必要用fig, ax = plt.subplots(figsize=(10,10))
.
- pandas
.dt
accessor用于提取'date'
的各种成分,必须是datetime dtype
- 如果
'date'
不是datetime dtype
,则用df.date = pd.to_datetime(df.date)
转换。
- 使用
python 3.8.11
、pandas 1.3.1
和 matplotlib 3.4.2
进行了测试
导入和测试数据
import pandas as pd
from calendar import month_name # conveniently supplies a list of sorted month names or you can type them out manually
import numpy as np # for test data
# test data and dataframe
np.random.seed(365)
rows = 365 * 3
data = {'date': pd.bdate_range('2021-01-01', freq='D', periods=rows), 'value': np.random.randint(100, 1001, size=(rows))}
df = pd.DataFrame(data)
# select data within specified quantiles
df = df[df.value.gt(df.value.quantile(0.025)) & df.value.lt(df.value.quantile(0.975))]
# display(df.head())
date value
0 2021-01-01 694
1 2021-01-02 792
2 2021-01-03 901
3 2021-01-04 959
4 2021-01-05 528
变换和绘图
- 如果
'date'
已设置为索引,如评论中所述,请使用以下内容:
df['months'] = pd.Categorical(df.index.strftime('%B'), categories=months, ordered=True)
# create the month column
months = month_name[1:]
df['months'] = pd.Categorical(df.date.dt.strftime('%B'), categories=months, ordered=True)
# pivot the dataframe into the correct shape
dfp = pd.pivot_table(data=df, index=df.date.dt.year, columns='months', values='value')
# display(dfp.head())
months January February March April May June July August September October November December
date
2021 637.9 595.7 569.8 508.3 589.4 557.7 508.2 545.7 560.3 526.2 577.1 546.8
2022 567.9 521.5 625.5 469.8 582.6 627.3 630.4 474.0 544.1 609.6 526.6 572.1
2023 521.1 548.5 484.0 528.2 473.3 547.7 525.3 522.4 424.7 561.3 513.9 602.3
# plot
ax = dfp.plot(kind='bar', figsize=(12, 4), ylabel='Mean Page Views', xlabel='Year', rot=0)
_ = ax.legend(bbox_to_anchor=(1, 1.02), loc='upper left')
我在尝试创建条形图时遇到了困难,DataFrame
按年份和月份分组。使用以下代码,我试图在创建的图像中绘制数据,而不是返回第二张图像。我还尝试将图例向右移动并将其值更改为相应的月份。
我开始对使用 groupby
命令获得的 DataFrame 有了一些感觉,虽然没有得到我期望的结果让我问你们。
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
df = pd.read_csv('fcc-forum-pageviews.csv', index_col='date')
line_plot = df.value[(df.value > df.value.quantile(0.025)) & (df.value < df.value.quantile(0.975))]
fig, ax = plt.subplots(figsize=(10,10))
bar_plot = line_plot.groupby([line_plot.index.year, line_plot.index.month]).mean().unstack()
bar_plot.plot(kind='bar')
ax.set_xlabel('Years')
ax.set_ylabel('Average Page Views')
plt.show()
这是我正在分析的数据格式。
date,value
2016-05-09,1201
2016-05-10,2329
2016-05-11,1716
2016-05-12,10539
2016-05-13,6933
只需将您定义的 ax
传递给 pandas
:
bar_plot.plot(ax = ax, kind='bar')
如果您还想用名称替换月份数字,则必须获取这些标签,用名称替换数字并通过将新标签传递给它来重新定义图例:
handles, labels = ax.get_legend_handles_labels()
new_labels = [datetime.date(1900, int(monthinteger), 1).strftime('%B') for monthinteger in labels]
ax.legend(handles = handles, labels = new_labels, loc = 'upper left', bbox_to_anchor = (1.02, 1))
完整代码
import pandas as pd
from matplotlib import pyplot as plt
import datetime
df = pd.read_csv('fcc-forum-pageviews.csv')
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')
line_plot = df.value[(df.value > df.value.quantile(0.025)) & (df.value < df.value.quantile(0.975))]
fig, ax = plt.subplots(figsize=(10,10))
bar_plot = line_plot.groupby([line_plot.index.year, line_plot.index.month]).mean().unstack()
bar_plot.plot(ax = ax, kind='bar')
ax.set_xlabel('Years')
ax.set_ylabel('Average Page Views')
handles, labels = ax.get_legend_handles_labels()
new_labels = [datetime.date(1900, int(monthinteger), 1).strftime('%B') for monthinteger in labels]
ax.legend(handles = handles, labels = new_labels, loc = 'upper left', bbox_to_anchor = (1.02, 1))
plt.show()
(用假数据生成的图)
- 添加
pd.Categorical
的排序分类 - 使用
pd.pivot_table
将数据帧转换为宽格式,其中aggfunc='mean'
是默认值。- 宽格式通常最适合绘制分组条形图。
pandas.DataFrame.plot
returnsmatplotlib.axes.Axes
,所以没必要用fig, ax = plt.subplots(figsize=(10,10))
.- pandas
.dt
accessor用于提取'date'
的各种成分,必须是datetime dtype
- 如果
'date'
不是datetime dtype
,则用df.date = pd.to_datetime(df.date)
转换。
- 如果
- 使用
python 3.8.11
、pandas 1.3.1
和matplotlib 3.4.2
进行了测试
'month'
列
导入和测试数据
import pandas as pd
from calendar import month_name # conveniently supplies a list of sorted month names or you can type them out manually
import numpy as np # for test data
# test data and dataframe
np.random.seed(365)
rows = 365 * 3
data = {'date': pd.bdate_range('2021-01-01', freq='D', periods=rows), 'value': np.random.randint(100, 1001, size=(rows))}
df = pd.DataFrame(data)
# select data within specified quantiles
df = df[df.value.gt(df.value.quantile(0.025)) & df.value.lt(df.value.quantile(0.975))]
# display(df.head())
date value
0 2021-01-01 694
1 2021-01-02 792
2 2021-01-03 901
3 2021-01-04 959
4 2021-01-05 528
变换和绘图
- 如果
'date'
已设置为索引,如评论中所述,请使用以下内容:df['months'] = pd.Categorical(df.index.strftime('%B'), categories=months, ordered=True)
# create the month column
months = month_name[1:]
df['months'] = pd.Categorical(df.date.dt.strftime('%B'), categories=months, ordered=True)
# pivot the dataframe into the correct shape
dfp = pd.pivot_table(data=df, index=df.date.dt.year, columns='months', values='value')
# display(dfp.head())
months January February March April May June July August September October November December
date
2021 637.9 595.7 569.8 508.3 589.4 557.7 508.2 545.7 560.3 526.2 577.1 546.8
2022 567.9 521.5 625.5 469.8 582.6 627.3 630.4 474.0 544.1 609.6 526.6 572.1
2023 521.1 548.5 484.0 528.2 473.3 547.7 525.3 522.4 424.7 561.3 513.9 602.3
# plot
ax = dfp.plot(kind='bar', figsize=(12, 4), ylabel='Mean Page Views', xlabel='Year', rot=0)
_ = ax.legend(bbox_to_anchor=(1, 1.02), loc='upper left')