Pandas 箱线图,还包括带有标记的最新值
Pandas Boxplot , also include most recent value with a marker
我有一个箱线图,我想从具有 5 个不同分类变量(不同类型的油)的时间序列中显示。
如何使用标记显示最近的值,显示在相关的箱形图上。在我的示例中,我有变量 maxDate 来显示最近的值每种油类型。
import pandas as pd
import seaborn as sns
# read Data Files, create data frame for all products
dfBr = pd.read_excel (r'\filepath.xlsx',
skiprows=1,
usecols=(0,1,13,14))
dfCb = pd.read_excel (r'\filepath.xlsx',
skiprows=1,
usecols=(0,1,13,14))
sns.set_style('whitegrid')
total = [dfBr,dfCb]
df = pd.concat(total)
df.columns =['Commodity', 'Date', 'mmLong', 'mmShort']
df.tail()
df['Net_OI']=df['mmLong']-df['mmShort']
df['LS_Ratio']=df['mmLong']/df['mmShort']
df=df[df['Date'] > 180600]
df['Commodity'] = df['Commodity'].replace(['CRUDE OIL, LIGHT SWEET - NEW YORK MERCANTILE EXCHANGE',
'ICE Brent Crude Futures - ICE Futures Europe',
'CRUDE OIL, LIGHT SWEET-WTI - ICE FUTURES EUROPE',
'GASOLINE BLENDSTOCK (RBOB) - NEW YORK MERCANTILE EXCHANGE',
'#2 HEATING OIL- NY HARBOR-ULSD - NEW YORK MERCANTILE EXCHANGE']
,['WTI',
'BRN',
'ICE',
'RBOB',
'HO'])
maxDate = df.Date.iloc[-1]
currentWTI = df.loc[ (df['Commodity'] == 'WTI') & (df['Date'] == maxDate)]
currentBrn = df.loc[ (df['Commodity'] == 'BRN') & (df['Date'] == maxDate)]
currentIce = df.loc[ (df['Commodity'] == 'ICE') & (df['Date'] == maxDate)]
currentRb = df.loc[ (df['Commodity'] == 'RBOB') & (df['Date'] == maxDate)]
currentHo = df.loc[ (df['Commodity'] == 'HO') & (df['Date'] == maxDate)]
fig, ax = plt.subplots(figsize=(8,8))
#sns.boxplot(x='Net_OI', y='Market_and_Exchange_Names', data=three_yr_df);
sns.boxplot(x=df.LS_Ratio, y=df.Commodity);
plt.scatter(currentBrn.LS_Ratio, 0,marker='*', s=350, color='orange');
目前最终结果看起来像这样,但我想显示所有 5 个项目的箱线图,并在 5 个项目中的每个项目上使用 Star 标记。
如有任何帮助,我们将不胜感激。
Boxplot with 1 marker
由于没有给出数据,所以我使用官方参考中的箱线图作为示例来对您的作业进行编码。关键是使用每个类别变量中的最大值作为放置星星的数组。
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(style="whitegrid")
tips = sns.load_dataset("tips")
fig, ax = plt.subplots()
ax = sns.boxplot(x="day", y="total_bill", data=tips)
star = tips[['day','total_bill']].groupby('day').max()
ax.scatter(star.index, star.total_bill, marker='*', s=350, color='orange')
plt.show()
我有一个箱线图,我想从具有 5 个不同分类变量(不同类型的油)的时间序列中显示。
如何使用标记显示最近的值,显示在相关的箱形图上。在我的示例中,我有变量 maxDate 来显示最近的值每种油类型。
import pandas as pd
import seaborn as sns
# read Data Files, create data frame for all products
dfBr = pd.read_excel (r'\filepath.xlsx',
skiprows=1,
usecols=(0,1,13,14))
dfCb = pd.read_excel (r'\filepath.xlsx',
skiprows=1,
usecols=(0,1,13,14))
sns.set_style('whitegrid')
total = [dfBr,dfCb]
df = pd.concat(total)
df.columns =['Commodity', 'Date', 'mmLong', 'mmShort']
df.tail()
df['Net_OI']=df['mmLong']-df['mmShort']
df['LS_Ratio']=df['mmLong']/df['mmShort']
df=df[df['Date'] > 180600]
df['Commodity'] = df['Commodity'].replace(['CRUDE OIL, LIGHT SWEET - NEW YORK MERCANTILE EXCHANGE',
'ICE Brent Crude Futures - ICE Futures Europe',
'CRUDE OIL, LIGHT SWEET-WTI - ICE FUTURES EUROPE',
'GASOLINE BLENDSTOCK (RBOB) - NEW YORK MERCANTILE EXCHANGE',
'#2 HEATING OIL- NY HARBOR-ULSD - NEW YORK MERCANTILE EXCHANGE']
,['WTI',
'BRN',
'ICE',
'RBOB',
'HO'])
maxDate = df.Date.iloc[-1]
currentWTI = df.loc[ (df['Commodity'] == 'WTI') & (df['Date'] == maxDate)]
currentBrn = df.loc[ (df['Commodity'] == 'BRN') & (df['Date'] == maxDate)]
currentIce = df.loc[ (df['Commodity'] == 'ICE') & (df['Date'] == maxDate)]
currentRb = df.loc[ (df['Commodity'] == 'RBOB') & (df['Date'] == maxDate)]
currentHo = df.loc[ (df['Commodity'] == 'HO') & (df['Date'] == maxDate)]
fig, ax = plt.subplots(figsize=(8,8))
#sns.boxplot(x='Net_OI', y='Market_and_Exchange_Names', data=three_yr_df);
sns.boxplot(x=df.LS_Ratio, y=df.Commodity);
plt.scatter(currentBrn.LS_Ratio, 0,marker='*', s=350, color='orange');
目前最终结果看起来像这样,但我想显示所有 5 个项目的箱线图,并在 5 个项目中的每个项目上使用 Star 标记。
如有任何帮助,我们将不胜感激。
Boxplot with 1 marker
由于没有给出数据,所以我使用官方参考中的箱线图作为示例来对您的作业进行编码。关键是使用每个类别变量中的最大值作为放置星星的数组。
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(style="whitegrid")
tips = sns.load_dataset("tips")
fig, ax = plt.subplots()
ax = sns.boxplot(x="day", y="total_bill", data=tips)
star = tips[['day','total_bill']].groupby('day').max()
ax.scatter(star.index, star.total_bill, marker='*', s=350, color='orange')
plt.show()