fill_between where 条件简单,但令人费解的问题

fill_between where condition easy, but puzzling problem

我有一个从 2002 年到 2017 年的 df,看起来像这样:

link to gdrive with .csv

Team,Finish.1_x,Finish.1_y,Win%Detroit,Win%Chicago,Date,GdpDetroit,GdpChicago,D_GDP_Change,C_GDP_Change,detroitdummy,chicagodummy
2002–03,1st,6th,0.61,0.366,2002-01-01,49650,54744,933.0,-27.0,1,0
2003–04,2nd,8th,0.659,0.28,2003-01-01,51101,55273,1451.0,529.0,1,1
2004–05,1st,2nd,0.659,0.573,2004-01-01,50935,56507,-166.0,1234.0,0,1
2005–06,1st,4th[k],0.78,0.5,2005-01-01,52028,57608,1093.0,1101.0,1,1
2006–07,1st,3rd,0.646,0.598,2006-01-01,50576,58717,-1452.0,1109.0,0,1
2007–08,1st,4th,0.72,0.402,2007-01-01,50450,59240,-126.0,523.0,0,1
2008–09,3rd,2nd,0.476,0.5,2008-01-01,47835,57197,-2615.0,-2043.0,0,0
2009–10,5th,3rd,0.329,0.5,2009-01-01,43030,54802,-4805.0,-2395.0,0,0
2010–11,4th,1st,0.366,0.756,2010-01-01,45735,55165,2705.0,363.0,1,1
2012–13,4th,2nd,0.354,0.549,2012-01-01,48469,57254,926.0,1463.0,1,1
2013–14,4th,2nd,0.354,0.585,2013-01-01,48708,56939,239.0,-315.0,1,0
2014–15,5th,2nd,0.39,0.61,2014-01-01,49594,57823,886.0,884.0,1,1
2015–16,3rd,4th,0.537,0.512,2015-01-01,50793,59285,1199.0,1462.0,1,1
2016–17,5th,4th,0.451,0.5,2016-01-01,51578,60191,785.0,906.0,1,1
2017–18,4th,5th,0.476,0.329,2017-01-01,52879,61170,1301.0,979.0,1,1

我正在尝试制作这样的图表:

x 为日期列,y 为获胜百分比,fill_between 以 GDP 变化的符号为条件:GDP 增长则为蓝色,GDP 下降则为红色。 我使用的代码:

fig, ax = plt.subplots()
ax.fill_between(df["Date"],np.max(df["Win%Detroit"]), where=(np.sign(df["D_GDP_Change"])<= 0), color='r', alpha=.1)
ax.fill_between(df["Date"],np.max(df["Win%Detroit"]), where=(np.sign(df["D_GDP_Change"])> 0), color='b', alpha=.1)
ax.plot(df["Date"],df["Win%Detroit"])
plt.show()

无法理解我做错了什么,因为补丁有空格。你能给我一个提示吗?

我觉得答案就在这里:conditional matplotlib fill_between for dataframe。然而,在这种情况下,我无法弄清楚如何使函数更清晰。

您使用 where=fill_between 版本仅在两个日期的条件都为真时在后续日期之间填充。

要根据条件获得背景填充,您可以使用步进函数进行填充,让它在 0 和最大值之间步进:

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

df = pd.DataFrame({"Date": pd.date_range('20020101', '20170101', freq='YS'),
                   "Win%Detroit": np.random.uniform(0.3, 0.7, 16),
                   "D_GDP_Change": np.random.randint(-2000, 2000, 16)})

fig, ax = plt.subplots()
max_val = np.max(df["Win%Detroit"]) * 1.05
ax.fill_between(df["Date"], np.where(df["D_GDP_Change"] <= 0, max_val, 0), color='r', alpha=.1, step='post')
ax.fill_between(df["Date"], np.where(df["D_GDP_Change"] > 0, max_val, 0), color='b', alpha=.1, step='post')
ax.plot(df["Date"], df["Win%Detroit"])
ax.margins(x=0, y=0)
plt.show()

使用来自 post 的数据框,并将 'Date' 列更新为日期类型 (df['Date'] = pd.to_datetime(df['Date'])):