如何使用 Pandas 保留 Series 中具有重叠时间的最新数据
How to use Pandas to keep most recent data from Series with overlapping times
我有多个pandas系列,它们有一些重叠的时间:
In [1]: import pandas as pd
In [2]: cycle_00z = pd.Series(data=[10, 10, 10, 10],
index=pd.date_range('2015-01-01 00', '2015-01-01 03', freq='H'))
In [3]: cycle_02z = pd.Series(data=[20, 20, 20, 20],
index=pd.date_range('2015-01-01 02', '2015-01-01 05', freq='H'))
In [4]: cycle_04z = pd.Series(data=[30, 30, 30, 30],
index=pd.date_range('2015-01-01 04', '2015-01-01 07', freq='H'))
In [5]: cycle_00z
Out[5]:
2015-01-01 00:00:00 10
2015-01-01 01:00:00 10
2015-01-01 02:00:00 10
2015-01-01 03:00:00 10
Freq: H, dtype: int64
In [6]: cycle_02z
Out[6]:
2015-01-01 02:00:00 20
2015-01-01 03:00:00 20
2015-01-01 04:00:00 20
2015-01-01 05:00:00 20
Freq: H, dtype: int64
In [7]: cycle_04z
Out[7]:
2015-01-01 04:00:00 30
2015-01-01 05:00:00 30
2015-01-01 06:00:00 30
2015-01-01 07:00:00 30
Freq: H, dtype: int64
我想根据这三个创建另一个 pandas 系列,其中将包含这三个周期的独特时间和最新数据(当时间重叠时)。在这种情况下,它看起来像这样:
In [8]: continuous = pd.Series(data=[10, 10, 20, 20, 30, 30, 30, 30],
index=pd.date_range('2015-01-01 00', '2015-01-01 07', freq='H'))
In [9]: continuous
Out[21]:
2015-01-01 00:00:00 10
2015-01-01 01:00:00 10
2015-01-01 02:00:00 20
2015-01-01 03:00:00 20
2015-01-01 04:00:00 30
2015-01-01 05:00:00 30
2015-01-01 06:00:00 30
2015-01-01 07:00:00 30
Freq: H, dtype: int64
只是想知道是否有使用 pandas 实现该目标的巧妙方法?我实际上需要在 xray DataArrays 中实现该技术,但我想这个想法是一样的。本质上,始终保留最近周期的数据。
谢谢
一种方法是使用combine_first
方法:
In [39]: cycle_04z.combine_first(cycle_02z).combine_first(cycle_00z)
Out[39]:
2015-01-01 00:00:00 10
2015-01-01 01:00:00 10
2015-01-01 02:00:00 20
2015-01-01 03:00:00 20
2015-01-01 04:00:00 30
2015-01-01 05:00:00 30
2015-01-01 06:00:00 30
2015-01-01 07:00:00 30
Freq: H, dtype: float64
或者,如果您在循环中执行更新,则类似这样的方法可行:
In [40]: result = cycle_00z
In [41]: result = cycle_02z.combine_first(result)
In [42]: result = cycle_04z.combine_first(result)
In [43]: result
Out[43]:
2015-01-01 00:00:00 10
2015-01-01 01:00:00 10
2015-01-01 02:00:00 20
2015-01-01 03:00:00 20
2015-01-01 04:00:00 30
2015-01-01 05:00:00 30
2015-01-01 06:00:00 30
2015-01-01 07:00:00 30
Freq: H, dtype: float64
我有多个pandas系列,它们有一些重叠的时间:
In [1]: import pandas as pd
In [2]: cycle_00z = pd.Series(data=[10, 10, 10, 10],
index=pd.date_range('2015-01-01 00', '2015-01-01 03', freq='H'))
In [3]: cycle_02z = pd.Series(data=[20, 20, 20, 20],
index=pd.date_range('2015-01-01 02', '2015-01-01 05', freq='H'))
In [4]: cycle_04z = pd.Series(data=[30, 30, 30, 30],
index=pd.date_range('2015-01-01 04', '2015-01-01 07', freq='H'))
In [5]: cycle_00z
Out[5]:
2015-01-01 00:00:00 10
2015-01-01 01:00:00 10
2015-01-01 02:00:00 10
2015-01-01 03:00:00 10
Freq: H, dtype: int64
In [6]: cycle_02z
Out[6]:
2015-01-01 02:00:00 20
2015-01-01 03:00:00 20
2015-01-01 04:00:00 20
2015-01-01 05:00:00 20
Freq: H, dtype: int64
In [7]: cycle_04z
Out[7]:
2015-01-01 04:00:00 30
2015-01-01 05:00:00 30
2015-01-01 06:00:00 30
2015-01-01 07:00:00 30
Freq: H, dtype: int64
我想根据这三个创建另一个 pandas 系列,其中将包含这三个周期的独特时间和最新数据(当时间重叠时)。在这种情况下,它看起来像这样:
In [8]: continuous = pd.Series(data=[10, 10, 20, 20, 30, 30, 30, 30],
index=pd.date_range('2015-01-01 00', '2015-01-01 07', freq='H'))
In [9]: continuous
Out[21]:
2015-01-01 00:00:00 10
2015-01-01 01:00:00 10
2015-01-01 02:00:00 20
2015-01-01 03:00:00 20
2015-01-01 04:00:00 30
2015-01-01 05:00:00 30
2015-01-01 06:00:00 30
2015-01-01 07:00:00 30
Freq: H, dtype: int64
只是想知道是否有使用 pandas 实现该目标的巧妙方法?我实际上需要在 xray DataArrays 中实现该技术,但我想这个想法是一样的。本质上,始终保留最近周期的数据。
谢谢
一种方法是使用combine_first
方法:
In [39]: cycle_04z.combine_first(cycle_02z).combine_first(cycle_00z)
Out[39]:
2015-01-01 00:00:00 10
2015-01-01 01:00:00 10
2015-01-01 02:00:00 20
2015-01-01 03:00:00 20
2015-01-01 04:00:00 30
2015-01-01 05:00:00 30
2015-01-01 06:00:00 30
2015-01-01 07:00:00 30
Freq: H, dtype: float64
或者,如果您在循环中执行更新,则类似这样的方法可行:
In [40]: result = cycle_00z
In [41]: result = cycle_02z.combine_first(result)
In [42]: result = cycle_04z.combine_first(result)
In [43]: result
Out[43]:
2015-01-01 00:00:00 10
2015-01-01 01:00:00 10
2015-01-01 02:00:00 20
2015-01-01 03:00:00 20
2015-01-01 04:00:00 30
2015-01-01 05:00:00 30
2015-01-01 06:00:00 30
2015-01-01 07:00:00 30
Freq: H, dtype: float64