绘制具有不同日期的两个时间序列的值

Question

我有两个时间序列。每个时间序列（s1 和 s2）都由一个值列表和一个相应的时间列表（例如时间戳或其他）表示。我正在使用 python，例如我有：

s1_values = [6,8,6,3,7,9] # len(s1_values) == len(s1_times)
s1_times =  [1,3,6,7,8,12]

s2_values = [3,8,7,2,5,4,6,2] # len(s2_values) == len(s2_times)
s2_times =  [2,4,5,7,8,9,10,13]

我想看看两个时间序列 s1 和 s2 之间的关系，所以我希望能够绘制 s1_values（在 x 轴上）对 s2_values（在y 轴）使用 Matplotlib，但由于两个时间序列没有及时对齐，我不知道该怎么做。

也许有一些针对时间序列的常用方法，但我不知道它们。

Answer 1

您可以使用 pandas (docs)，这对时间序列数据非常有用。在这种情况下，您将制作两个数据框，然后合并并对它们进行排序。

merge 为您提供了一个合并的 "Time" 系列（很多不同的合并方式 here), inserting nan values into the value columns where there isn't a value for that time. This is then sorted by the shared Time column. The df.fillna function (docs) accepts the method parameter which if it is ffill or pad fills gaps with the last valid value, and if bfill fills with the next valid value. Alternatively you can use df.interpolate for linear interpolation of missing values (docs）。

方便的是 pandas 包装 matplotlib 所以你可以直接从数据帧中绘制。

import matplotlib.pyplot as plt
import pandas as pd


s1_values = [6,8,6,3,7,9] 
s1_times =  [1,3,6,7,8,12]

s2_values = [3,8,7,2,5,4,6,2]
s2_times =  [2,4,5,7,8,9,10,13]

df1 = pd.DataFrame(zip(s1_times, s1_values), columns=['Time', 's1 values'])
df2 = pd.DataFrame(zip(s2_times, s2_values), columns=['Time', 's2 values'])

df = df1.merge(df2, how='outer', on='Time', sort='Time')
df.fillna(method='pad', inplace=True)  # or df.interpolate(inplace=True)

df.plot(kind='scatter', x='s1 values', y='s2 values')
plt.show()

使用fillna(method='ffill')

使用interpolate()

绘制具有不同日期的两个时间序列的值

Plot values of two time series with different dates

python

interpolation

time-series

matplotlib

resampling