Plotly:如何处理金融时间序列的缺失日期?

Plotly: How to handle missing dates for a financial time series?

金融时间序列通常充满缺失数据。开箱即用,通过仅显示如下所示的一行,以可视化方式处理带有缺失时间戳的系列。但这里的挑战是将时间戳解释为一个值,并在图中插入所有缺失的日期。

大多数时候,我发现将这些日期完全排除在外会更好看。 https://plotly.com/python/time-series/#hiding-weekends-and-holidays 下的 plotly 文档中的示例显示了如何使用 某些 日期类别(如周末或假期)处理缺失日期:

fig.update_xaxes(
    rangebreaks=[
        dict(bounds=["sat", "mon"]), #hide weekends
        dict(values=["2015-12-25", "2016-01-01"])  # hide Christmas and New Year's
    ]
)

这里的缺点是您的数据集也可能缺少任何其他工作日的一些数据。当然,您必须为不同国家/地区指定特定的假期日期,那么还有其他方法吗?

可重现代码:

import pandas as pd
import numpy as np
import plotly.graph_objects as go

# data
np.random.seed(1234)
n_obs = 15
frequency = 'D'
daterange = pd.date_range('2020', freq=frequency, periods=n_obs)
values = np.random.randint(low=-5, high=6, size=n_obs).tolist()
df = pd.DataFrame({'time':daterange, 'value':values})
df = df.set_index('time')
df.iloc[0]=100; df['value']=df.value.cumsum()

# Missing timestamps
df.iloc[2:5] = np.nan; df.iloc[8:13] = np.nan
df.dropna(inplace = True)

# plotly figure
fig=go.Figure(go.Scatter(x=df.index, y =df['value']))
fig.update_layout(template = 'plotly_dark')
fig.show()

他们这里的关键还是要使用rangebreak属性。但是,如果您要遵循链接示例中解释的方法,则必须手动包含每个缺失的日期。但是这种情况下数据缺失的解决方法其实是更多缺失数据。这就是为什么:

1.你可以retrieve the timestamps从你系列的开头到结尾,然后

2. 在此期间(可能缺少更多日期)构建 complete timeline 使用:

dt_all = pd.date_range(start=df.index[0],
                       end=df.index[-1],
                       freq = 'D')

3. 接下来你可以 isolate the timestampsdf.index 中有 不在那个时间线中 使用:

dt_breaks = [d for d in dt_all_py if d not in dt_obs_py]

4. 最后,您可以像这样在 rangebreaks 中包含这些时间戳:

fig.update_xaxes(
    rangebreaks=[dict(values=dt_breaks)]
)

剧情:

完整代码:

import pandas as pd
import numpy as np
import plotly.graph_objects as go

# data
np.random.seed(1234)
n_obs = 15
frequency = 'D'
daterange = pd.date_range('2020', freq=frequency, periods=n_obs)
values = np.random.randint(low=-5, high=6, size=n_obs).tolist()
df = pd.DataFrame({'time':daterange, 'value':values})
df = df.set_index('time')
df.iloc[0]=100; df['value']=df.value.cumsum()

# Missing timestamps
df.iloc[2:5] = np.nan; df.iloc[8:13] = np.nan
df.dropna(inplace = True)

# plotly figure
fig=go.Figure(go.Scatter(x=df.index, y =df['value']))
fig.update_layout(template = 'plotly_dark')

# complete timeline between first and last timestamps
dt_all = pd.date_range(start=df.index[0],
                       end=df.index[-1],
                       freq = frequency)
                        
# make sure input and synthetic time series are of the same types
dt_all_py = [d.to_pydatetime() for d in dt_all]
dt_obs_py = [d.to_pydatetime() for d in df.index]

# find which timestamps are missing in the complete timeline
dt_breaks = [d for d in dt_all_py if d not in dt_obs_py]

# remove missing timestamps from visualization
fig.update_xaxes(
    rangebreaks=[dict(values=dt_breaks)] # hide timestamps with no values
)
#fig.update_layout(title=dict(text="Some dates are missing, but still displayed"))
fig.update_layout(title=dict(text="Missing dates are excluded by rangebreaks"))
fig.update_xaxes(showgrid=False)
fig.show()

您可以使用 dtick 属性。通过 dtick 属性 将滴答间隔更改为一天,应该以毫秒为单位定义,例如 86400000。参考以下代码:

fig.update_xaxes(dtick=86400000)