如何使用 Pandas 保留 Series 中具有重叠时间的最新数据

How to use Pandas to keep most recent data from Series with overlapping times

我有多个pandas系列,它们有一些重叠的时间:

In [1]: import pandas as pd

In [2]: cycle_00z = pd.Series(data=[10, 10, 10, 10], 
           index=pd.date_range('2015-01-01 00', '2015-01-01 03', freq='H'))
In [3]: cycle_02z = pd.Series(data=[20, 20, 20, 20], 
           index=pd.date_range('2015-01-01 02', '2015-01-01 05', freq='H'))
In [4]: cycle_04z = pd.Series(data=[30, 30, 30, 30], 
           index=pd.date_range('2015-01-01 04', '2015-01-01 07', freq='H'))

In [5]: cycle_00z
Out[5]: 
2015-01-01 00:00:00    10
2015-01-01 01:00:00    10
2015-01-01 02:00:00    10
2015-01-01 03:00:00    10
Freq: H, dtype: int64

In [6]: cycle_02z
Out[6]: 
2015-01-01 02:00:00    20
2015-01-01 03:00:00    20
2015-01-01 04:00:00    20
2015-01-01 05:00:00    20
Freq: H, dtype: int64

In [7]: cycle_04z
Out[7]: 
2015-01-01 04:00:00    30
2015-01-01 05:00:00    30
2015-01-01 06:00:00    30
2015-01-01 07:00:00    30
Freq: H, dtype: int64

我想根据这三个创建另一个 pandas 系列,其中将包含这三个周期的独特时间和最新数据(当时间重叠时)。在这种情况下,它看起来像这样:

In [8]: continuous = pd.Series(data=[10, 10, 20, 20, 30, 30, 30, 30], 
             index=pd.date_range('2015-01-01 00', '2015-01-01 07', freq='H'))

In [9]: continuous
Out[21]: 
2015-01-01 00:00:00    10
2015-01-01 01:00:00    10
2015-01-01 02:00:00    20
2015-01-01 03:00:00    20
2015-01-01 04:00:00    30
2015-01-01 05:00:00    30
2015-01-01 06:00:00    30
2015-01-01 07:00:00    30
Freq: H, dtype: int64

只是想知道是否有使用 pandas 实现该目标的巧妙方法?我实际上需要在 xray DataArrays 中实现该技术,但我想这个想法是一样的。本质上,始终保留最近周期的数据。

谢谢

一种方法是使用combine_first方法:

In [39]: cycle_04z.combine_first(cycle_02z).combine_first(cycle_00z)
Out[39]: 
2015-01-01 00:00:00    10
2015-01-01 01:00:00    10
2015-01-01 02:00:00    20
2015-01-01 03:00:00    20
2015-01-01 04:00:00    30
2015-01-01 05:00:00    30
2015-01-01 06:00:00    30
2015-01-01 07:00:00    30
Freq: H, dtype: float64

或者,如果您在循环中执行更新,则类似这样的方法可行:

In [40]: result = cycle_00z

In [41]: result = cycle_02z.combine_first(result)

In [42]: result = cycle_04z.combine_first(result)

In [43]: result
Out[43]: 
2015-01-01 00:00:00    10
2015-01-01 01:00:00    10
2015-01-01 02:00:00    20
2015-01-01 03:00:00    20
2015-01-01 04:00:00    30
2015-01-01 05:00:00    30
2015-01-01 06:00:00    30
2015-01-01 07:00:00    30
Freq: H, dtype: float64