pandas 中时间戳的累计时间总和

Cumulative sum of time from timestamps in pandas

我正在尝试根据给定事件跟踪中的时间戳查找时间总和(以秒为单位)。数据输入和输出在 Pandas DataFrame 中。那怎么可能呢?

示例输入:

   CaseID                         Timestamps
        0   2016-01-01 09:51:15.304000+00:00    
        0   2016-01-01 09:53:15.352000+00:00    
        1   2016-01-01 09:51:15.774000+00:00    
        1   2016-01-01 09:51:47.392000+00:00    
        1   2016-01-01 09:52:15.403000+00:00        

我也想把总和累加起来;忽略诸如毫秒之类的微小差异。

示例输出:

Case ID       sum_time
      0              0                
      0            120
      1              0
      1             32
      1             60

这应该可以解决问题,

import numpy as np
import pandas as pd

# recreate original data
ts = """\
2016-01-01 09:51:15.304000+00:00
2016-01-01 09:53:15.352000+00:00
2016-01-01 09:51:15.774000+00:00
2016-01-01 09:51:47.392000+00:00
2016-01-01 09:52:15.403000+00:00""".split("\n")

df = pd.DataFrame({"CaseID": [0, 0, 1, 1, 1],
                   "Timestamp": [pd.Timestamp(tmp) for tmp in ts]})


# solve the problem

def calc_csum(partial_frame):
    """
    Takes a data frame with a Timestamp column;
    Add new colum with cummulative sum.
    """
   
    # 1. create the difference array
    r = partial_frame.Timestamp.diff()
    
    # 2. fill the first value (NaT) with zero
    r[r.isna()] = pd.Timedelta(0)
    # 3. convert to seconds and use cumsum -> new column
    partial_frame["cs"] = np.cumsum(r.dt.total_seconds().values)
    return partial_frame

# apply to each "sub frame" with same CaseID
res = df.groupby("CaseID").apply(calc_csum)
print(res)

结果:

    CaseID                        Timestamp       cs
0       0   2016-01-01 09:51:15.304000+00:00    0.000
1       0   2016-01-01 09:53:15.352000+00:00  120.048
2       1   2016-01-01 09:51:15.774000+00:00    0.000
3       1   2016-01-01 09:51:47.392000+00:00   31.618
4       1   2016-01-01 09:52:15.403000+00:00   59.629