如何使用 Pandas 从 TimeSeries 数据集计算过程持续时间

Question

我有一个庞大的数据集，其中包含按时间顺序（按时间戳）和传感器类型排序的各种传感器数据。我想通过从最后一个条目中减去传感器的第一个条目来计算以秒为单位的过程持续时间。这是用 python 和 pandas 完成的。附上一个例子，以便更好地理解： enter image description here

我想从每种传感器类型的最后一行中减去第一行以获得以秒为单位的过程持续时间（即第 8 行减去第 1 行：2022-04-04T09:44:56.962Z - 2022-04- 04T09:44:56.507Z = 0.455 秒）。然后应将持续时间写入传感器类型最后一行中新创建的列。

提前致谢！

Answer 1

假设您的 'timestamp' 列已经 'to_datetime' 转换，这行得通吗？

df['diffPerSensor_type']=df.groupby('sensor_type')['timestamp'].transform('last')-df.groupby('sensor_type')['timestamp'].transform('first')

然后你可以用这个提取你的秒数

df['diffPerSensor_type'].dt.seconds

Answer 2

如果有人想重现一个例子，这里有一个df：

import pandas as pd

df = pd.DataFrame({
    'sensor_type' : [0]*7 + [1]*11 + [13]*5 + [8]*5,
    'timestamp' : pd.date_range('2022-04-04', periods=28, freq='ms'),
    'value' : [128] * 28
})
df['time_diff in milliseconds'] = (df.groupby('sensor_type')['timestamp']
                   .transform(lambda x: x.iloc[-1]-x.iloc[0])
                   .dt.components.milliseconds)

print(df.head(10))
   sensor_type               timestamp  value  time_diff in milliseconds
0            0 2022-04-04 00:00:00.000    128                          6
1            0 2022-04-04 00:00:00.001    128                          6
2            0 2022-04-04 00:00:00.002    128                          6
3            0 2022-04-04 00:00:00.003    128                          6
4            0 2022-04-04 00:00:00.004    128                          6
5            0 2022-04-04 00:00:00.005    128                          6
6            0 2022-04-04 00:00:00.006    128                          6
7            1 2022-04-04 00:00:00.007    128                         10
8            1 2022-04-04 00:00:00.008    128                         10
9            1 2022-04-04 00:00:00.009    128                         10

我的解决方案与@Daniel Weigel 几乎相同，只是我使用 lambda 来计算差异。

如何使用 Pandas 从 TimeSeries 数据集计算过程持续时间

How to calculate a Process Duration from a TimeSeries Dataset with Pandas

python

time-series

pandas