如何将由 .pct_change() 数据制成的注释添加到线图

How to add annotation made of .pct_change() data to line plot

我有这些数据:

values = [["Arts & Humanities",19.00, 13.43, 7.21, 5.11, 2.64], ["Life Sciences & 
Biomedicine", 64.41, 53.89, 45.01, 32.44, 14.82],
["Physical Sciences", 43.62, 37.26,  30.72,  19.71, 8.30],
["Social Sciences", 50.71, 42.32, 34.19, 26.85, 12.47], ["Technology", 52.48, 49.28, 36.65, 29.25, 14.77]]

我已经绘制了这些数据的线图。

data = pd.DataFrame(values, columns = ["Research_categories",'2017', '2018', '2019', '2020', '2021'])
data.set_index('Research_categories', inplace=True)
df = data.T
plot = df.plot()
plt.subplots_adjust(right=0.869)

plt.show()

现在我需要为年中的每个点添加注释。该注释应该由百分比变化组成。所以我准备了这个数据框:

percentage_df = data.pct_change(axis='columns')

这个数据框是这样的:

                             2017      2018      2019      2020      2021
Research_categories
Arts & Humanities             NaN -0.293158 -0.463142 -0.291262 -0.483366
Life Sciences & Biomedicine   NaN -0.163329 -0.164780 -0.279271 -0.543157
Physical Sciences             NaN -0.145805 -0.175523 -0.358398 -0.578894
Social Sciences               NaN -0.165451 -0.192108 -0.214683 -0.535568
Technology                    NaN -0.060976 -0.256291 -0.201910 -0.495043

如何从此数据框中获取数据并将其显示为绘图中的注释?

我对 Python 中的可视化非常陌生。到目前为止,这对我来说是非常棘手的部分。如果有任何帮助,我将不胜感激。非常感谢您的帮助!

Matplotlib 有一个 built-in annotation 函数,您只需在其中指定注释的值和您想要的坐标。

在你的例子中,我们只需要遍历两个数据帧来获取数据的y-value(来自data)和要写在图表上的值(来自percentage_df).

for i, column in enumerate(data):
    if not column == '2017': #no point plotting NANs
        for val1, val2 in zip(data[column], percentage_df[column]):
            plot.annotate(
                text = val2, 
                xy = (i, val1), #must use counter as data is plotted as categorical 
                )

请注意,由于您的数据在技术上是分类的(年份是字符串而不是数字),我们需要使用枚举来获得一个计数器,它为我们提供了一个 x-position 的注释。

这给出了下图:

符合您的标准,但看起来很糟糕。因此,让我们将其变大并将数字四舍五入到小数点后两位。

完整代码:

import pandas as pd
import matplotlib.pyplot as plt

values = [["Arts & Humanities",19.00, 13.43, 7.21, 5.11, 2.64], 
          ["Life Sciences & Biomedicine", 64.41, 53.89, 45.01, 32.44, 14.82],
          ["Physical Sciences", 43.62, 37.26,  30.72,  19.71, 8.30],
          ["Social Sciences", 50.71, 42.32, 34.19, 26.85, 12.47], 
          ["Technology", 52.48, 49.28, 36.65, 29.25, 14.77]
         ]

data = pd.DataFrame(values, columns = ["Research_categories",'2017', '2018', '2019', '2020', '2021'])
data.set_index('Research_categories', inplace=True)
df = data.T

fig, ax = plt.subplots(1,1, figsize = (8,5), dpi = 150)

df.plot(ax=ax)

percentage_df = data.pct_change(axis='columns')

for i, column in enumerate(data):
    if not column == '2017': #no point plotting NANs
        for val1, val2 in zip(data[column], percentage_df[column]):
            ax.annotate(
                text = round(val2, 2), 
                xy = (i, val1), #must use counter as data is plotted as categorical 
                )

plt.show()