pandas 中时间戳的累计时间总和
Cumulative sum of time from timestamps in pandas
我正在尝试根据给定事件跟踪中的时间戳查找时间总和(以秒为单位)。数据输入和输出在 Pandas DataFrame 中。那怎么可能呢?
示例输入:
CaseID Timestamps
0 2016-01-01 09:51:15.304000+00:00
0 2016-01-01 09:53:15.352000+00:00
1 2016-01-01 09:51:15.774000+00:00
1 2016-01-01 09:51:47.392000+00:00
1 2016-01-01 09:52:15.403000+00:00
我也想把总和累加起来;忽略诸如毫秒之类的微小差异。
示例输出:
Case ID sum_time
0 0
0 120
1 0
1 32
1 60
这应该可以解决问题,
import numpy as np
import pandas as pd
# recreate original data
ts = """\
2016-01-01 09:51:15.304000+00:00
2016-01-01 09:53:15.352000+00:00
2016-01-01 09:51:15.774000+00:00
2016-01-01 09:51:47.392000+00:00
2016-01-01 09:52:15.403000+00:00""".split("\n")
df = pd.DataFrame({"CaseID": [0, 0, 1, 1, 1],
"Timestamp": [pd.Timestamp(tmp) for tmp in ts]})
# solve the problem
def calc_csum(partial_frame):
"""
Takes a data frame with a Timestamp column;
Add new colum with cummulative sum.
"""
# 1. create the difference array
r = partial_frame.Timestamp.diff()
# 2. fill the first value (NaT) with zero
r[r.isna()] = pd.Timedelta(0)
# 3. convert to seconds and use cumsum -> new column
partial_frame["cs"] = np.cumsum(r.dt.total_seconds().values)
return partial_frame
# apply to each "sub frame" with same CaseID
res = df.groupby("CaseID").apply(calc_csum)
print(res)
结果:
CaseID Timestamp cs
0 0 2016-01-01 09:51:15.304000+00:00 0.000
1 0 2016-01-01 09:53:15.352000+00:00 120.048
2 1 2016-01-01 09:51:15.774000+00:00 0.000
3 1 2016-01-01 09:51:47.392000+00:00 31.618
4 1 2016-01-01 09:52:15.403000+00:00 59.629
我正在尝试根据给定事件跟踪中的时间戳查找时间总和(以秒为单位)。数据输入和输出在 Pandas DataFrame 中。那怎么可能呢?
示例输入:
CaseID Timestamps
0 2016-01-01 09:51:15.304000+00:00
0 2016-01-01 09:53:15.352000+00:00
1 2016-01-01 09:51:15.774000+00:00
1 2016-01-01 09:51:47.392000+00:00
1 2016-01-01 09:52:15.403000+00:00
我也想把总和累加起来;忽略诸如毫秒之类的微小差异。
示例输出:
Case ID sum_time
0 0
0 120
1 0
1 32
1 60
这应该可以解决问题,
import numpy as np
import pandas as pd
# recreate original data
ts = """\
2016-01-01 09:51:15.304000+00:00
2016-01-01 09:53:15.352000+00:00
2016-01-01 09:51:15.774000+00:00
2016-01-01 09:51:47.392000+00:00
2016-01-01 09:52:15.403000+00:00""".split("\n")
df = pd.DataFrame({"CaseID": [0, 0, 1, 1, 1],
"Timestamp": [pd.Timestamp(tmp) for tmp in ts]})
# solve the problem
def calc_csum(partial_frame):
"""
Takes a data frame with a Timestamp column;
Add new colum with cummulative sum.
"""
# 1. create the difference array
r = partial_frame.Timestamp.diff()
# 2. fill the first value (NaT) with zero
r[r.isna()] = pd.Timedelta(0)
# 3. convert to seconds and use cumsum -> new column
partial_frame["cs"] = np.cumsum(r.dt.total_seconds().values)
return partial_frame
# apply to each "sub frame" with same CaseID
res = df.groupby("CaseID").apply(calc_csum)
print(res)
结果:
CaseID Timestamp cs
0 0 2016-01-01 09:51:15.304000+00:00 0.000
1 0 2016-01-01 09:53:15.352000+00:00 120.048
2 1 2016-01-01 09:51:15.774000+00:00 0.000
3 1 2016-01-01 09:51:47.392000+00:00 31.618
4 1 2016-01-01 09:52:15.403000+00:00 59.629