在 Python pandas 中对齐折线图和条形图
Aligning line and bar charts in Python pandas
我正在尝试使用 Python 的 pandas 库在条形图之上绘制折线图。条形图是来自 DataFrame 的聚类图表,单独绘制时看起来像这样:
Clustered Bar Chart
折线图是整个数据集的平均值:
Line Chart
如果使用所有折线图,我可以在同一个轴对象上成功绘制两组数据:
All Line Charts
但是当我将聚类条形图的 DataFrame 切换为实际使用条形图并将其与折线图一起绘制时,折线图希望从第二个索引位置开始绘制,从而导致偏移。
Bar and Line With Offset
的回答对 matplotlib 处理条形图和折线图的 x 轴的方式有一个有趣的评论,这可能与此相关,但我不知道该怎么做这种洞察力。
解决此对齐问题的好方法是什么?
重现代码:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
table_data = {2019: {1: np.nan,
2: np.nan,
3: np.nan,
4: 1.4200000000000002,
5: 0.8193548387096775,
6: 1.420689655172414,
7: 0.4645161290322581,
8: 0.10322580645161289,
9: 0.29333333333333333,
10: 1.7741935483870968,
11: 0.32,
12: 6.703225806451614},
2020: {1: 5.52,
2: 12.613793103448277,
3: 0.9428571428571428,
4: 0.1793103448275862,
5: 0.39354838709677414,
6: 1.3866666666666667,
7: 1.9800000000000002,
8: 0.6689655172413793,
9: 0.19333333333333336,
10: 5.896774193548388,
11: 0.6896551724137931,
12: 4.103225806451613},
2021: {1: 2.7935483870967746,
2: 5.15,
3: 9.696774193548388,
4: 3.74,
5: 2.8967741935483873,
6: 0.9103448275862069,
7: 1.6516129032258065,
8: 0.3,
9: 0.38571428571428573,
10: 5.141935483870968,
11: 8.58,
12: 6.052173913043479},
2022: {1: 2.3923076923076922,
2: 31.678571428571427,
3: 8.761290322580646,
4: np.nan,
5: np.nan,
6: np.nan,
7: np.nan,
8: np.nan,
9: np.nan,
10: np.nan,
11: np.nan,
12: np.nan}}
means = {1: 3.6137931034482755,
2: 16.435294117647057,
3: 7.132530120481928,
4: 1.797752808988764,
5: 1.3698924731182796,
6: 1.240909090909091,
7: 1.358695652173913,
8: 0.3522727272727273,
9: 0.28863636363636364,
10: 4.2709677419354835,
11: 3.2247191011235956,
12: 5.578823529411765}
df_bars = pd.DataFrame(table_data)
df_means = pd.DataFrame.from_dict(means, orient = 'index', columns = ['Mean'])
# Clustered bar chart by itself
df_bars.plot(kind = 'bar',
title = 'Average Daily Rainfall by Month',
ylabel = 'Average Daily Rainfall (mm)',
figsize = (10, 6)
)
# Line chart by itself
df_means.plot(
kind = 'line',
title = 'Average Daily Rainfall by Month',
ylabel = 'Average Daily Rainfall (mm)',
y = 'Mean'
)
# Show all data as line charts. This works OK
ax_avg = df_bars.plot(kind = 'line',
title = 'Average Daily Rainfall by Month',
ylabel = 'Average Daily Rainfall (mm)',
figsize = (10, 6)
)
df_means.plot(
ax = ax_avg,
kind = 'line',
y = 'Mean'
)
plt.show()
# Show bar data and line chart on the one plot. The line chart is offset!
ax_avg2 = df_bars.plot(kind = 'bar',
title = 'Average Daily Rainfall by Month',
ylabel = 'Average Daily Rainfall (mm)',
figsize = (10, 6)
)
df_means.plot(
ax = ax_avg2,
kind = 'line',
y = 'Mean'
)
plt.show()
您可以使用 reset_index
将行数据帧的索引更改回从零开始。
这将使您的条形图与 zero-based 索引对齐,如下所示:
ax_avg2 = df_bars.plot(kind = 'bar',
title = 'Average Daily Rainfall by Month',
ylabel = 'Average Daily Rainfall (mm)',
figsize = (10, 6)
)
df_means.reset_index().plot(
ax = ax_avg2,
kind = 'line',
y = 'Mean'
)
输出:
我正在尝试使用 Python 的 pandas 库在条形图之上绘制折线图。条形图是来自 DataFrame 的聚类图表,单独绘制时看起来像这样:
Clustered Bar Chart
折线图是整个数据集的平均值:
Line Chart
如果使用所有折线图,我可以在同一个轴对象上成功绘制两组数据:
All Line Charts
但是当我将聚类条形图的 DataFrame 切换为实际使用条形图并将其与折线图一起绘制时,折线图希望从第二个索引位置开始绘制,从而导致偏移。
Bar and Line With Offset
解决此对齐问题的好方法是什么?
重现代码:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
table_data = {2019: {1: np.nan,
2: np.nan,
3: np.nan,
4: 1.4200000000000002,
5: 0.8193548387096775,
6: 1.420689655172414,
7: 0.4645161290322581,
8: 0.10322580645161289,
9: 0.29333333333333333,
10: 1.7741935483870968,
11: 0.32,
12: 6.703225806451614},
2020: {1: 5.52,
2: 12.613793103448277,
3: 0.9428571428571428,
4: 0.1793103448275862,
5: 0.39354838709677414,
6: 1.3866666666666667,
7: 1.9800000000000002,
8: 0.6689655172413793,
9: 0.19333333333333336,
10: 5.896774193548388,
11: 0.6896551724137931,
12: 4.103225806451613},
2021: {1: 2.7935483870967746,
2: 5.15,
3: 9.696774193548388,
4: 3.74,
5: 2.8967741935483873,
6: 0.9103448275862069,
7: 1.6516129032258065,
8: 0.3,
9: 0.38571428571428573,
10: 5.141935483870968,
11: 8.58,
12: 6.052173913043479},
2022: {1: 2.3923076923076922,
2: 31.678571428571427,
3: 8.761290322580646,
4: np.nan,
5: np.nan,
6: np.nan,
7: np.nan,
8: np.nan,
9: np.nan,
10: np.nan,
11: np.nan,
12: np.nan}}
means = {1: 3.6137931034482755,
2: 16.435294117647057,
3: 7.132530120481928,
4: 1.797752808988764,
5: 1.3698924731182796,
6: 1.240909090909091,
7: 1.358695652173913,
8: 0.3522727272727273,
9: 0.28863636363636364,
10: 4.2709677419354835,
11: 3.2247191011235956,
12: 5.578823529411765}
df_bars = pd.DataFrame(table_data)
df_means = pd.DataFrame.from_dict(means, orient = 'index', columns = ['Mean'])
# Clustered bar chart by itself
df_bars.plot(kind = 'bar',
title = 'Average Daily Rainfall by Month',
ylabel = 'Average Daily Rainfall (mm)',
figsize = (10, 6)
)
# Line chart by itself
df_means.plot(
kind = 'line',
title = 'Average Daily Rainfall by Month',
ylabel = 'Average Daily Rainfall (mm)',
y = 'Mean'
)
# Show all data as line charts. This works OK
ax_avg = df_bars.plot(kind = 'line',
title = 'Average Daily Rainfall by Month',
ylabel = 'Average Daily Rainfall (mm)',
figsize = (10, 6)
)
df_means.plot(
ax = ax_avg,
kind = 'line',
y = 'Mean'
)
plt.show()
# Show bar data and line chart on the one plot. The line chart is offset!
ax_avg2 = df_bars.plot(kind = 'bar',
title = 'Average Daily Rainfall by Month',
ylabel = 'Average Daily Rainfall (mm)',
figsize = (10, 6)
)
df_means.plot(
ax = ax_avg2,
kind = 'line',
y = 'Mean'
)
plt.show()
您可以使用 reset_index
将行数据帧的索引更改回从零开始。
这将使您的条形图与 zero-based 索引对齐,如下所示:
ax_avg2 = df_bars.plot(kind = 'bar',
title = 'Average Daily Rainfall by Month',
ylabel = 'Average Daily Rainfall (mm)',
figsize = (10, 6)
)
df_means.reset_index().plot(
ax = ax_avg2,
kind = 'line',
y = 'Mean'
)
输出: