Pandas 箱线图,还包括带有标记的最新值

Pandas Boxplot , also include most recent value with a marker

我有一个箱线图,我想从具有 5 个不同分类变量(不同类型的油)的时间序列中显示。

如何使用标记显示最近的值,显示在相关的箱形图上。在我的示例中,我有变量 maxDate 来显示最近的值每种油类型。

import pandas as pd
import seaborn as sns

# read Data Files, create data frame for all products
dfBr = pd.read_excel (r'\filepath.xlsx',
                   skiprows=1,
                   usecols=(0,1,13,14))
dfCb = pd.read_excel (r'\filepath.xlsx',
                   skiprows=1,
                   usecols=(0,1,13,14))
sns.set_style('whitegrid')
total = [dfBr,dfCb]
df = pd.concat(total)

df.columns =['Commodity', 'Date', 'mmLong', 'mmShort']
df.tail() 

df['Net_OI']=df['mmLong']-df['mmShort']
df['LS_Ratio']=df['mmLong']/df['mmShort']

df=df[df['Date'] > 180600]

df['Commodity'] = df['Commodity'].replace(['CRUDE OIL, LIGHT SWEET - NEW YORK MERCANTILE EXCHANGE',
                                           'ICE Brent Crude Futures - ICE Futures Europe',
                                           'CRUDE OIL, LIGHT SWEET-WTI - ICE FUTURES EUROPE',
                                           'GASOLINE BLENDSTOCK (RBOB)  - NEW YORK MERCANTILE EXCHANGE',
                                           '#2 HEATING OIL- NY HARBOR-ULSD - NEW YORK MERCANTILE EXCHANGE']
                                          ,['WTI',
                                            'BRN',
                                            'ICE',
                                            'RBOB',
                                            'HO'])
maxDate = df.Date.iloc[-1]

currentWTI = df.loc[ (df['Commodity'] == 'WTI') & (df['Date'] == maxDate)]
currentBrn = df.loc[ (df['Commodity'] == 'BRN') & (df['Date'] == maxDate)]
currentIce = df.loc[ (df['Commodity'] == 'ICE') & (df['Date'] == maxDate)]
currentRb = df.loc[ (df['Commodity'] == 'RBOB') & (df['Date'] == maxDate)]
currentHo = df.loc[ (df['Commodity'] == 'HO') & (df['Date'] == maxDate)]


fig, ax = plt.subplots(figsize=(8,8))
#sns.boxplot(x='Net_OI', y='Market_and_Exchange_Names', data=three_yr_df);
sns.boxplot(x=df.LS_Ratio, y=df.Commodity);

plt.scatter(currentBrn.LS_Ratio, 0,marker='*', s=350, color='orange');

目前最终结果看起来像这样,但我想显示所有 5 个项目的箱线图,并在 5 个项目中的每个项目上使用 Star 标记。

如有任何帮助,我们将不胜感激。

Boxplot with 1 marker

由于没有给出数据,所以我使用官方参考中的箱线图作为示例来对您的作业进行编码。关键是使用每个类别变量中的最大值作为放置星星的数组。

import seaborn as sns
import matplotlib.pyplot as plt

sns.set_theme(style="whitegrid")
tips = sns.load_dataset("tips")

fig, ax = plt.subplots()
ax = sns.boxplot(x="day", y="total_bill", data=tips)
star = tips[['day','total_bill']].groupby('day').max()
ax.scatter(star.index, star.total_bill, marker='*', s=350, color='orange')
plt.show()