计算两个时间序列的交集与 pandas.Series 之间的相关性

Question

考虑两个时间序列 pandas.Series:

tser_a:

date

2016-05-25 13:30:00.023   50.41
2016-05-26 13:30:00.023   51.96
2016-05-27 13:30:00.030   51.98
2016-05-28 13:30:00.041   52.00
2016-05-29 13:30:00.048   52.01
2016-06-02 13:30:00.049   51.97
2016-06-03 13:30:00.072   52.01
2016-06-04 13:30:00.075   52.10

tser_b:

date

2016-05-24 13:30:00.023   74.41
2016-05-25 13:30:00.023   74.96
2016-05-26 13:30:00.030   74.98
2016-05-27 13:30:00.041   73.00
2016-05-28 13:30:00.048   73.01
2016-05-29 13:30:00.049   73.97
2016-06-02 13:30:00.072   72.01
2016-06-03 13:30:00.075   72.10

我想计算这两个时间序列之间的相关性。

Pandas 确实提供了 pandas.Series.corr (ref) 函数来计算这样的值。

corr = tser_a.corr(tser_b)

我的疑惑：

但是，我需要确保相关性考虑了每个值的完全相同的日期，因此只考虑 tser_a 和 tser_b 之间的交集。

作为伪代码：

if ((tser_a[date_x] IS NOT NIL) AND (tser_b[date_x] IS NOT NIL)):
    then: consider(tser_a[date_x], tser_b[date_x])
else:
    then: skip and go ahead

然后：

tser_a -> 2016-05-24 13:30:00.023   74.41
tser_b -> 2016-06-04 13:30:00.075   52.10

必须排除。

pandas.Series.corr 是否默认假定此行为，或者我应该首先根据 date 将两个时间序列相交？

Answer 1

看起来 tser_a.corr(tser_b) 与索引匹配。但是，由于这两个数据可能没有完全相同的时间戳，您会得到意想不到的结果。相反，您可以先使用 resample：

tser_a.resample('D').mean().corr(tser_b.resample('D').mean())
# out -0.5522781562573792

计算两个时间序列的交集与 pandas.Series 之间的相关性

Compute the correlation between the intersection of two timeseries with pandas.Series

python

correlation

pandas

我的疑惑：