注释一些散点图观察结果

Question

我使用下面的“示例数据框”(df) 和代码在 matplotlib 中绘制了哑铃图。

结果看起来不错，但到目前为止我无法在 df["avg"] 列中用平均值注释哑铃图。

有人可以指导我如何将每个观察值的平均值添加到各自的红点上方吗？非常感谢！

代码如下：

#example data
data = {'Brand': ['HC','TC','FF','AA'],
'2019Price': [22000,25000,27000,35000],
'2020Price':[25000, 30000, 29000, 39000]}
df = pd.DataFrame(data)
df["avg"] = (df['2019Price'] + df[ '2020Price'])/2
df = df.sort_values("2020Price", ascending = False)

#dumb bell plot
plt.hlines(y = df["Brand"], xmin = df["2019Price"], xmax = 
df["2020Price"], color = "grey", alpha = 0.4)
plt.scatter(y = df["Brand"], x = df["2019Price"], color = "blue", 
label = "2019")
plt.scatter(y = df["Brand"], x = df["2020Price"], color = "blue", 
label = "2020")
plt.scatter(y = df["Brand"], x = df["avg"], color = "red", label = 
"average")

plt.legend()

Answer 1

用 .iterrows, and add annotations with .annotate.

'Brand'

'avg'

matplotlib Tutorials: Annotations
使用 pandas 1.3.1 和 matplotlib 3.4.2

import pandas as pd
import matplotlib.pyplot as plt

data = {'Brand': ['HC','TC','FF','AA'],
        '2019Price': [22000,25000,27000,35000],
        '2020Price':[25000, 30000, 29000, 39000]}

df = pd.DataFrame(data)

df["avg"] = df[['2019Price', '2020Price']].mean(axis=1)

df = df.sort_values("2020Price", ascending = False)

fig, ax = plt.subplots(figsize=(8, 6))

ax.hlines(y=df["Brand"], xmin=df["2019Price"], xmax=df["2020Price"], color="grey", alpha=0.4)

ax.scatter(y=df["Brand"], x=df["2019Price"], color="blue", label="2019")
ax.scatter(y=df["Brand"], x=df["2020Price"], color="blue", label="2020")
ax.scatter(y=df["Brand"], x=df["avg"], color="red", label="average")

_ = ax.legend()

# add annotations for average
for i, (j, k) in df[['Brand', 'avg']].iterrows():
    ax.annotate(f'{k:0.0f}', xy=(k, j), xytext=(-15, 5), textcoords='offset points')

使用pandas.DataFrame.plot创建散点图。这使用 matplotlib 作为后端并消除了单独导入 matplotlib 的需要。

import pandas as pd

data = {'Brand': ['HC','TC','FF','AA'],
        '2019Price': [22000,25000,27000,35000],
        '2020Price':[25000, 30000, 29000, 39000]}

df = pd.DataFrame(data)

df["avg"] = df[['2019Price', '2020Price']].mean(axis=1)

df = df.sort_values("2020Price", ascending = False)

ax = df.plot(kind='scatter', y='Brand', x='2019Price', c='b', label='2019', figsize=(8, 6))
df.plot(kind='scatter', y='Brand', x='2020Price', c='b', label='2020', ax=ax)
df.plot(kind='scatter', y='Brand', x='avg', c='r', label='average', ax=ax)

ax.hlines(y=df["Brand"], xmin=df["2019Price"], xmax=df["2020Price"], color="grey", alpha=0.4)

for i, (j, k) in df[['Brand', 'avg']].iterrows():
    ax.annotate(f'{k:0.0f}', xy=(k, j), xytext=(-15, 5), textcoords='offset points')

注释一些散点图观察结果

Annotate some scatter plot observations

python

annotate

matplotlib

pandas

seaborn