按列分组,按时间戳排序并计算 Pandas 数据帧中时间戳之间的差异?
Groupby column, sort by timestamp and calculate diff between timestamps in Pandas Dataframe?
我有以下数据框:
{'Timestamp': {0: Timestamp('2019-06-07 13:37:00'),
1: Timestamp('2019-03-08 09:10:00'),
2: Timestamp('2019-06-24 11:45:00'),
3: Timestamp('2019-01-10 05:32:00'),
4: Timestamp('2019-01-08 12:24:00'),
5: Timestamp('2019-10-21 13:40:00'),
6: Timestamp('2019-03-07 15:50:00'),
7: Timestamp('2019-02-07 14:48:00'),
8: Timestamp('2019-05-15 09:07:00'),
9: Timestamp('2019-12-10 20:56:00'),
10: Timestamp('2019-01-30 13:58:00'),
11: Timestamp('2019-05-15 08:37:00'),
12: Timestamp('2019-10-03 18:17:00'),
13: Timestamp('2019-01-04 13:23:00'),
14: Timestamp('2019-02-26 13:55:00'),
15: Timestamp('2019-07-30 11:47:00'),
16: Timestamp('2019-11-06 17:35:00'),
17: Timestamp('2019-02-22 08:03:00'),
18: Timestamp('2019-07-17 11:16:00'),
19: Timestamp('2019-09-13 09:24:00')},
'Col1': {0: 'B',
1: 'A',
2: 'A',
3: 'A',
4: 'B',
5: 'A',
6: 'C',
7: 'C',
8: 'C',
9: 'C',
10: 'C',
11: 'A',
12: 'B',
13: 'A',
14: 'A',
15: 'B',
16: 'B',
17: 'B',
18: 'D',
19: 'D'}}
我想按 Col1
分组,对组内的时间戳进行排序并计算差异并在第一行中填充 0。
我正在尝试使用以下代码执行此操作,但它没有提供我需要的东西:
df.sort_values(by='Timestamp').groupby('Col1').Timestamp.diff()
如果我没理解错的话,你想按 ["Col1", "Timestamp"]
排序,然后 .groupby
:
df = df.sort_values(by=["Col1", "Timestamp"])
df["result"] = df.groupby("Col1").diff().fillna(pd.Timedelta(0))
print(df)
打印:
Timestamp Col1 result
13 2019-01-04 13:23:00 A 0 days 00:00:00
3 2019-01-10 05:32:00 A 5 days 16:09:00
14 2019-02-26 13:55:00 A 47 days 08:23:00
1 2019-03-08 09:10:00 A 9 days 19:15:00
11 2019-05-15 08:37:00 A 67 days 23:27:00
2 2019-06-24 11:45:00 A 40 days 03:08:00
5 2019-10-21 13:40:00 A 119 days 01:55:00
4 2019-01-08 12:24:00 B 0 days 00:00:00
17 2019-02-22 08:03:00 B 44 days 19:39:00
0 2019-06-07 13:37:00 B 105 days 05:34:00
15 2019-07-30 11:47:00 B 52 days 22:10:00
12 2019-10-03 18:17:00 B 65 days 06:30:00
16 2019-11-06 17:35:00 B 33 days 23:18:00
10 2019-01-30 13:58:00 C 0 days 00:00:00
7 2019-02-07 14:48:00 C 8 days 00:50:00
6 2019-03-07 15:50:00 C 28 days 01:02:00
8 2019-05-15 09:07:00 C 68 days 17:17:00
9 2019-12-10 20:56:00 C 209 days 11:49:00
18 2019-07-17 11:16:00 D 0 days 00:00:00
19 2019-09-13 09:24:00 D 57 days 22:08:00
我有以下数据框:
{'Timestamp': {0: Timestamp('2019-06-07 13:37:00'),
1: Timestamp('2019-03-08 09:10:00'),
2: Timestamp('2019-06-24 11:45:00'),
3: Timestamp('2019-01-10 05:32:00'),
4: Timestamp('2019-01-08 12:24:00'),
5: Timestamp('2019-10-21 13:40:00'),
6: Timestamp('2019-03-07 15:50:00'),
7: Timestamp('2019-02-07 14:48:00'),
8: Timestamp('2019-05-15 09:07:00'),
9: Timestamp('2019-12-10 20:56:00'),
10: Timestamp('2019-01-30 13:58:00'),
11: Timestamp('2019-05-15 08:37:00'),
12: Timestamp('2019-10-03 18:17:00'),
13: Timestamp('2019-01-04 13:23:00'),
14: Timestamp('2019-02-26 13:55:00'),
15: Timestamp('2019-07-30 11:47:00'),
16: Timestamp('2019-11-06 17:35:00'),
17: Timestamp('2019-02-22 08:03:00'),
18: Timestamp('2019-07-17 11:16:00'),
19: Timestamp('2019-09-13 09:24:00')},
'Col1': {0: 'B',
1: 'A',
2: 'A',
3: 'A',
4: 'B',
5: 'A',
6: 'C',
7: 'C',
8: 'C',
9: 'C',
10: 'C',
11: 'A',
12: 'B',
13: 'A',
14: 'A',
15: 'B',
16: 'B',
17: 'B',
18: 'D',
19: 'D'}}
我想按 Col1
分组,对组内的时间戳进行排序并计算差异并在第一行中填充 0。
我正在尝试使用以下代码执行此操作,但它没有提供我需要的东西:
df.sort_values(by='Timestamp').groupby('Col1').Timestamp.diff()
如果我没理解错的话,你想按 ["Col1", "Timestamp"]
排序,然后 .groupby
:
df = df.sort_values(by=["Col1", "Timestamp"])
df["result"] = df.groupby("Col1").diff().fillna(pd.Timedelta(0))
print(df)
打印:
Timestamp Col1 result
13 2019-01-04 13:23:00 A 0 days 00:00:00
3 2019-01-10 05:32:00 A 5 days 16:09:00
14 2019-02-26 13:55:00 A 47 days 08:23:00
1 2019-03-08 09:10:00 A 9 days 19:15:00
11 2019-05-15 08:37:00 A 67 days 23:27:00
2 2019-06-24 11:45:00 A 40 days 03:08:00
5 2019-10-21 13:40:00 A 119 days 01:55:00
4 2019-01-08 12:24:00 B 0 days 00:00:00
17 2019-02-22 08:03:00 B 44 days 19:39:00
0 2019-06-07 13:37:00 B 105 days 05:34:00
15 2019-07-30 11:47:00 B 52 days 22:10:00
12 2019-10-03 18:17:00 B 65 days 06:30:00
16 2019-11-06 17:35:00 B 33 days 23:18:00
10 2019-01-30 13:58:00 C 0 days 00:00:00
7 2019-02-07 14:48:00 C 8 days 00:50:00
6 2019-03-07 15:50:00 C 28 days 01:02:00
8 2019-05-15 09:07:00 C 68 days 17:17:00
9 2019-12-10 20:56:00 C 209 days 11:49:00
18 2019-07-17 11:16:00 D 0 days 00:00:00
19 2019-09-13 09:24:00 D 57 days 22:08:00