Python 删除带有时间条件的行
Python removing rows with time condition
我有 2 组 Dataframe,都具有唯一标识符和格式相同的日期时间数据
“2020-01-01 00:00:01”-日期时间和“12345”-唯一标识符和类型
第一个问题,DF1:
DatetimeX ID Type
2020-01-01 02:00:01 12345 C
2020-01-01 02:00:03 12345 C
2020-01-01 05:00:03 12345 C
2020-01-01 05:03:05 12345 C
2020-01-01 03:00:09 13333 D
2020-01-01 02:00:09 12345 C
2020-01-01 02:01:35 12345 C
2020-01-01 02:10:35 12345 C
2020-01-01 02:00:01 13333 D
2020-01-01 02:05:35 13333 D
2020-01-01 02:00:50 13333 E
2020-01-01 02:00:01 12211 C
2020-01-01 02:09:50 13333 E
2020-01-01 02:11:50 13333 E
我想基于具有相同“类型”的 ID 的第一个时间戳,并在 10 分钟后删除行:
DatetimeX ID Type
2020-01-01 02:00:01 12345 C
2020-01-01 05:00:03 12345 C
2020-01-01 02:10:35 12345 C
2020-01-01 03:00:09 13333 D
2020-01-01 02:00:01 13333 D
2020-01-01 02:00:50 13333 E
2020-01-01 02:00:01 12211 C
2020-01-01 02:11:50 13333 E
我尝试探索 timerange/daterange 但找不到任何类似的编码概念。希望如果有人能指出我可以研究什么样的方法来探索而不是试图获得完整的解决方案。几年没接触python,以前也不熟悉。谢谢
更新了额外的数据行以获得更准确的示例
IIUC 你应该试试 groupby
:
>>> df.groupby((df.Type != df.Type.shift()).cumsum(), as_index=False).first()
DatetimeX ID Type
0 2020-01-01 02:00:01 12345 C
1 2020-01-01 02:00:01 13333 D
2 2020-01-01 02:00:50 13333 E
3 2020-01-01 02:00:01 12211 C
>>>
按连续相同的值分组。
根据你的说法我想根据 ID 的第一个时间戳具有相同的“类型”,并删除行 10 分钟,我相信你可以使用 groupby().transform()
识别第一个时间戳,然后使用布尔掩码:
# also transform('min')
first_timestamps = df.groupby(['ID','Type'])['DatetimeX'].transform('first')
mask = df['DatetimeX'] - first_timestamps < pd.Timedelta('10Min')
df[mask]
但是,由于您的示例数据彼此之间的时间都在 10 分钟以内,因此这不会减少任何内容。
相反,如果我们将上面第二行中的 10Min
更改为 1S
,我们将得到预期的输出:
DatetimeX ID Type
0 2020-01-01 02:00:01 12345 C
4 2020-01-01 02:00:01 13333 D
6 2020-01-01 02:00:50 13333 E
7 2020-01-01 02:00:01 12211 C
添加示例输入数据并简化流程:
Timestamp = pd.to_datetime
data = [{'DatetimeX': Timestamp('2020-01-01 02:00:01'), 'ID': 12345, 'Type': 'C'},
{'DatetimeX': Timestamp('2020-01-01 02:00:03'), 'ID': 12345, 'Type': 'C'},
{'DatetimeX': Timestamp('2020-01-01 05:00:03'), 'ID': 12345, 'Type': 'C'},
{'DatetimeX': Timestamp('2020-01-01 05:03:05'), 'ID': 12345, 'Type': 'C'},
{'DatetimeX': Timestamp('2020-01-01 03:00:09'), 'ID': 13333, 'Type': 'D'},
{'DatetimeX': Timestamp('2020-01-01 02:00:09'), 'ID': 12345, 'Type': 'C'},
{'DatetimeX': Timestamp('2020-01-01 02:01:35'), 'ID': 12345, 'Type': 'C'},
{'DatetimeX': Timestamp('2020-01-01 02:10:35'), 'ID': 12345, 'Type': 'C'},
{'DatetimeX': Timestamp('2020-01-01 02:00:01'), 'ID': 13333, 'Type': 'D'},
{'DatetimeX': Timestamp('2020-01-01 02:05:35'), 'ID': 13333, 'Type': 'D'},
{'DatetimeX': Timestamp('2020-01-01 02:00:50'), 'ID': 13333, 'Type': 'E'},
{'DatetimeX': Timestamp('2020-01-01 02:00:01'), 'ID': 12211, 'Type': 'C'},
{'DatetimeX': Timestamp('2020-01-01 02:09:50'), 'ID': 13333, 'Type': 'E'},
{'DatetimeX': Timestamp('2020-01-01 02:11:50'), 'ID': 13333, 'Type': 'E'}]
df1 = pd.DataFrame(data)
col_raw = df1.columns
while True:
df1.sort_values(['ID', 'Type', 'DatetimeX'], inplace=True)
df1['diff1_lt10min'] = df1.groupby(['ID', 'Type'])['DatetimeX'].diff().dt.seconds < 10 * 60
df1['tag_group'] = (~df1['diff1_lt10min']).cumsum()
if df1.duplicated('tag_group').sum()==0:
break
df1 = df1.merge((df1.groupby('tag_group')['DatetimeX'].first()
.reset_index()
.rename(columns={'DatetimeX':'DatetimeX_1st'})),
on='tag_group')
df1['diff2_lt10min'] = (df1.DatetimeX - df1.DatetimeX_1st).dt.seconds < 10 * 60
cond = df1['diff1_lt10min'] & df1['diff2_lt10min']
df1 = df1.loc[~cond, col_raw]
df1 = df1[col_raw]
详情...
# repeat
col_raw = df1.columns
df4 = df1.copy()
n_round = 1
while True:
print('#'*20, f'round {n_round}', '#'*20)
# step 1 sort the values & group by ['Type', 'ID'] calculate the DatetimeX's time diff
# notice: the time-diff is not the actual wanted
df = df4[col_raw].copy()
df.sort_values(['ID', 'Type', 'DatetimeX'], inplace=True)
df['diff'] = df.groupby(['Type', 'ID'])['DatetimeX'].diff()
print('#'*10, 'step1', '#'*10)
print(df)
# step 2, create a tag column to store the first 10min gap from 'diff' column
cond = False
cond |= df['diff'].dt.seconds > 10 * 60
cond |= df['diff'].isnull()
df['tag'] = np.where(cond, 1, 0)
df['tag'] = df['tag'].cumsum().fillna(method = 'ffill')
print('#'*10, 'step2', '#'*10)
print(df)
# step 3, use 'tag' to judge to stop the while loop or not
# tag should be unique
break_sign = df.tag.duplicated().sum()
if break_sign == 0:
break
print('#'*10, 'step3', '#'*10)
print(break_sign)
# step 4:
# create a 'DatetimeX_1st' with the 'tag' group's first DatetimeX
# create a 'diff2' = 'DatetimeX' - 'DatetimeX_1st'
df2 = df.reset_index().set_index('tag')
df2['DatetimeX_1st'] = df.groupby('tag').first()['DatetimeX']
df2['diff2'] = df2['DatetimeX'] - df2['DatetimeX_1st']
print('#'*10, 'step4', '#'*10)
print(df2)
# step 5:
# drop the True < 10min gaps records
# 'diff' and 'diff2' should all < 10min
cond = (df2['diff2'].dt.seconds < 10 * 60) & (df2['diff'].dt.seconds < 10 * 60)
df3 = df2[~cond].copy()
print('#'*10, 'step5', '#'*10)
print(df3)
# step 6:
# reset index
cols = 'tag DatetimeX ID Type'.split()
df4 = df3.reset_index().set_index('index').sort_index()[cols]
print('#'*10, 'step6', '#'*10)
print(df4)
n_round += 1
print()
# get result
result = df[['DatetimeX', 'ID', 'Type']].copy()
result.index.name = None
print()
print('#'*10, 'result', '#'*10)
print(result)
输出:
#################### round 1 ####################
########## step1 ##########
DatetimeX ID Type diff
11 2020-01-01 02:00:01 12211 C NaT
0 2020-01-01 02:00:01 12345 C NaT
1 2020-01-01 02:00:03 12345 C 0 days 00:00:02
5 2020-01-01 02:00:09 12345 C 0 days 00:00:06
6 2020-01-01 02:01:35 12345 C 0 days 00:01:26
7 2020-01-01 02:10:35 12345 C 0 days 00:09:00
2 2020-01-01 05:00:03 12345 C 0 days 02:49:28
3 2020-01-01 05:03:05 12345 C 0 days 00:03:02
8 2020-01-01 02:00:01 13333 D NaT
9 2020-01-01 02:05:35 13333 D 0 days 00:05:34
4 2020-01-01 03:00:09 13333 D 0 days 00:54:34
10 2020-01-01 02:00:50 13333 E NaT
12 2020-01-01 02:09:50 13333 E 0 days 00:09:00
13 2020-01-01 02:11:50 13333 E 0 days 00:02:00
########## step2 ##########
DatetimeX ID Type diff tag
11 2020-01-01 02:00:01 12211 C NaT 1
0 2020-01-01 02:00:01 12345 C NaT 2
1 2020-01-01 02:00:03 12345 C 0 days 00:00:02 2
5 2020-01-01 02:00:09 12345 C 0 days 00:00:06 2
6 2020-01-01 02:01:35 12345 C 0 days 00:01:26 2
7 2020-01-01 02:10:35 12345 C 0 days 00:09:00 2
2 2020-01-01 05:00:03 12345 C 0 days 02:49:28 3
3 2020-01-01 05:03:05 12345 C 0 days 00:03:02 3
8 2020-01-01 02:00:01 13333 D NaT 4
9 2020-01-01 02:05:35 13333 D 0 days 00:05:34 4
4 2020-01-01 03:00:09 13333 D 0 days 00:54:34 5
10 2020-01-01 02:00:50 13333 E NaT 6
12 2020-01-01 02:09:50 13333 E 0 days 00:09:00 6
13 2020-01-01 02:11:50 13333 E 0 days 00:02:00 6
########## step3 ##########
8
########## step4 ##########
index DatetimeX ID Type diff \
tag
1 11 2020-01-01 02:00:01 12211 C NaT
2 0 2020-01-01 02:00:01 12345 C NaT
2 1 2020-01-01 02:00:03 12345 C 0 days 00:00:02
2 5 2020-01-01 02:00:09 12345 C 0 days 00:00:06
2 6 2020-01-01 02:01:35 12345 C 0 days 00:01:26
2 7 2020-01-01 02:10:35 12345 C 0 days 00:09:00
3 2 2020-01-01 05:00:03 12345 C 0 days 02:49:28
3 3 2020-01-01 05:03:05 12345 C 0 days 00:03:02
4 8 2020-01-01 02:00:01 13333 D NaT
4 9 2020-01-01 02:05:35 13333 D 0 days 00:05:34
5 4 2020-01-01 03:00:09 13333 D 0 days 00:54:34
6 10 2020-01-01 02:00:50 13333 E NaT
6 12 2020-01-01 02:09:50 13333 E 0 days 00:09:00
6 13 2020-01-01 02:11:50 13333 E 0 days 00:02:00
DatetimeX_1st diff2
tag
1 2020-01-01 02:00:01 0 days 00:00:00
2 2020-01-01 02:00:01 0 days 00:00:00
2 2020-01-01 02:00:01 0 days 00:00:02
2 2020-01-01 02:00:01 0 days 00:00:08
2 2020-01-01 02:00:01 0 days 00:01:34
2 2020-01-01 02:00:01 0 days 00:10:34
3 2020-01-01 05:00:03 0 days 00:00:00
3 2020-01-01 05:00:03 0 days 00:03:02
4 2020-01-01 02:00:01 0 days 00:00:00
4 2020-01-01 02:00:01 0 days 00:05:34
5 2020-01-01 03:00:09 0 days 00:00:00
6 2020-01-01 02:00:50 0 days 00:00:00
6 2020-01-01 02:00:50 0 days 00:09:00
6 2020-01-01 02:00:50 0 days 00:11:00
########## step5 ##########
index DatetimeX ID Type diff \
tag
1 11 2020-01-01 02:00:01 12211 C NaT
2 0 2020-01-01 02:00:01 12345 C NaT
2 7 2020-01-01 02:10:35 12345 C 0 days 00:09:00
3 2 2020-01-01 05:00:03 12345 C 0 days 02:49:28
4 8 2020-01-01 02:00:01 13333 D NaT
5 4 2020-01-01 03:00:09 13333 D 0 days 00:54:34
6 10 2020-01-01 02:00:50 13333 E NaT
6 13 2020-01-01 02:11:50 13333 E 0 days 00:02:00
DatetimeX_1st diff2
tag
1 2020-01-01 02:00:01 0 days 00:00:00
2 2020-01-01 02:00:01 0 days 00:00:00
2 2020-01-01 02:00:01 0 days 00:10:34
3 2020-01-01 05:00:03 0 days 00:00:00
4 2020-01-01 02:00:01 0 days 00:00:00
5 2020-01-01 03:00:09 0 days 00:00:00
6 2020-01-01 02:00:50 0 days 00:00:00
6 2020-01-01 02:00:50 0 days 00:11:00
########## step6 ##########
tag DatetimeX ID Type
index
0 2 2020-01-01 02:00:01 12345 C
2 3 2020-01-01 05:00:03 12345 C
4 5 2020-01-01 03:00:09 13333 D
7 2 2020-01-01 02:10:35 12345 C
8 4 2020-01-01 02:00:01 13333 D
10 6 2020-01-01 02:00:50 13333 E
11 1 2020-01-01 02:00:01 12211 C
13 6 2020-01-01 02:11:50 13333 E
#################### round 2 ####################
########## step1 ##########
DatetimeX ID Type diff
index
11 2020-01-01 02:00:01 12211 C NaT
0 2020-01-01 02:00:01 12345 C NaT
7 2020-01-01 02:10:35 12345 C 0 days 00:10:34
2 2020-01-01 05:00:03 12345 C 0 days 02:49:28
8 2020-01-01 02:00:01 13333 D NaT
4 2020-01-01 03:00:09 13333 D 0 days 01:00:08
10 2020-01-01 02:00:50 13333 E NaT
13 2020-01-01 02:11:50 13333 E 0 days 00:11:00
########## step2 ##########
DatetimeX ID Type diff tag
index
11 2020-01-01 02:00:01 12211 C NaT 1
0 2020-01-01 02:00:01 12345 C NaT 2
7 2020-01-01 02:10:35 12345 C 0 days 00:10:34 3
2 2020-01-01 05:00:03 12345 C 0 days 02:49:28 4
8 2020-01-01 02:00:01 13333 D NaT 5
4 2020-01-01 03:00:09 13333 D 0 days 01:00:08 6
10 2020-01-01 02:00:50 13333 E NaT 7
13 2020-01-01 02:11:50 13333 E 0 days 00:11:00 8
########## result ##########
DatetimeX ID Type
11 2020-01-01 02:00:01 12211 C
0 2020-01-01 02:00:01 12345 C
7 2020-01-01 02:10:35 12345 C
2 2020-01-01 05:00:03 12345 C
8 2020-01-01 02:00:01 13333 D
4 2020-01-01 03:00:09 13333 D
10 2020-01-01 02:00:50 13333 E
13 2020-01-01 02:11:50 13333 E
我有 2 组 Dataframe,都具有唯一标识符和格式相同的日期时间数据
“2020-01-01 00:00:01”-日期时间和“12345”-唯一标识符和类型
第一个问题,DF1:
DatetimeX ID Type
2020-01-01 02:00:01 12345 C
2020-01-01 02:00:03 12345 C
2020-01-01 05:00:03 12345 C
2020-01-01 05:03:05 12345 C
2020-01-01 03:00:09 13333 D
2020-01-01 02:00:09 12345 C
2020-01-01 02:01:35 12345 C
2020-01-01 02:10:35 12345 C
2020-01-01 02:00:01 13333 D
2020-01-01 02:05:35 13333 D
2020-01-01 02:00:50 13333 E
2020-01-01 02:00:01 12211 C
2020-01-01 02:09:50 13333 E
2020-01-01 02:11:50 13333 E
我想基于具有相同“类型”的 ID 的第一个时间戳,并在 10 分钟后删除行:
DatetimeX ID Type
2020-01-01 02:00:01 12345 C
2020-01-01 05:00:03 12345 C
2020-01-01 02:10:35 12345 C
2020-01-01 03:00:09 13333 D
2020-01-01 02:00:01 13333 D
2020-01-01 02:00:50 13333 E
2020-01-01 02:00:01 12211 C
2020-01-01 02:11:50 13333 E
我尝试探索 timerange/daterange 但找不到任何类似的编码概念。希望如果有人能指出我可以研究什么样的方法来探索而不是试图获得完整的解决方案。几年没接触python,以前也不熟悉。谢谢
更新了额外的数据行以获得更准确的示例
IIUC 你应该试试 groupby
:
>>> df.groupby((df.Type != df.Type.shift()).cumsum(), as_index=False).first()
DatetimeX ID Type
0 2020-01-01 02:00:01 12345 C
1 2020-01-01 02:00:01 13333 D
2 2020-01-01 02:00:50 13333 E
3 2020-01-01 02:00:01 12211 C
>>>
按连续相同的值分组。
根据你的说法我想根据 ID 的第一个时间戳具有相同的“类型”,并删除行 10 分钟,我相信你可以使用 groupby().transform()
识别第一个时间戳,然后使用布尔掩码:
# also transform('min')
first_timestamps = df.groupby(['ID','Type'])['DatetimeX'].transform('first')
mask = df['DatetimeX'] - first_timestamps < pd.Timedelta('10Min')
df[mask]
但是,由于您的示例数据彼此之间的时间都在 10 分钟以内,因此这不会减少任何内容。
相反,如果我们将上面第二行中的 10Min
更改为 1S
,我们将得到预期的输出:
DatetimeX ID Type
0 2020-01-01 02:00:01 12345 C
4 2020-01-01 02:00:01 13333 D
6 2020-01-01 02:00:50 13333 E
7 2020-01-01 02:00:01 12211 C
添加示例输入数据并简化流程:
Timestamp = pd.to_datetime
data = [{'DatetimeX': Timestamp('2020-01-01 02:00:01'), 'ID': 12345, 'Type': 'C'},
{'DatetimeX': Timestamp('2020-01-01 02:00:03'), 'ID': 12345, 'Type': 'C'},
{'DatetimeX': Timestamp('2020-01-01 05:00:03'), 'ID': 12345, 'Type': 'C'},
{'DatetimeX': Timestamp('2020-01-01 05:03:05'), 'ID': 12345, 'Type': 'C'},
{'DatetimeX': Timestamp('2020-01-01 03:00:09'), 'ID': 13333, 'Type': 'D'},
{'DatetimeX': Timestamp('2020-01-01 02:00:09'), 'ID': 12345, 'Type': 'C'},
{'DatetimeX': Timestamp('2020-01-01 02:01:35'), 'ID': 12345, 'Type': 'C'},
{'DatetimeX': Timestamp('2020-01-01 02:10:35'), 'ID': 12345, 'Type': 'C'},
{'DatetimeX': Timestamp('2020-01-01 02:00:01'), 'ID': 13333, 'Type': 'D'},
{'DatetimeX': Timestamp('2020-01-01 02:05:35'), 'ID': 13333, 'Type': 'D'},
{'DatetimeX': Timestamp('2020-01-01 02:00:50'), 'ID': 13333, 'Type': 'E'},
{'DatetimeX': Timestamp('2020-01-01 02:00:01'), 'ID': 12211, 'Type': 'C'},
{'DatetimeX': Timestamp('2020-01-01 02:09:50'), 'ID': 13333, 'Type': 'E'},
{'DatetimeX': Timestamp('2020-01-01 02:11:50'), 'ID': 13333, 'Type': 'E'}]
df1 = pd.DataFrame(data)
col_raw = df1.columns
while True:
df1.sort_values(['ID', 'Type', 'DatetimeX'], inplace=True)
df1['diff1_lt10min'] = df1.groupby(['ID', 'Type'])['DatetimeX'].diff().dt.seconds < 10 * 60
df1['tag_group'] = (~df1['diff1_lt10min']).cumsum()
if df1.duplicated('tag_group').sum()==0:
break
df1 = df1.merge((df1.groupby('tag_group')['DatetimeX'].first()
.reset_index()
.rename(columns={'DatetimeX':'DatetimeX_1st'})),
on='tag_group')
df1['diff2_lt10min'] = (df1.DatetimeX - df1.DatetimeX_1st).dt.seconds < 10 * 60
cond = df1['diff1_lt10min'] & df1['diff2_lt10min']
df1 = df1.loc[~cond, col_raw]
df1 = df1[col_raw]
详情...
# repeat
col_raw = df1.columns
df4 = df1.copy()
n_round = 1
while True:
print('#'*20, f'round {n_round}', '#'*20)
# step 1 sort the values & group by ['Type', 'ID'] calculate the DatetimeX's time diff
# notice: the time-diff is not the actual wanted
df = df4[col_raw].copy()
df.sort_values(['ID', 'Type', 'DatetimeX'], inplace=True)
df['diff'] = df.groupby(['Type', 'ID'])['DatetimeX'].diff()
print('#'*10, 'step1', '#'*10)
print(df)
# step 2, create a tag column to store the first 10min gap from 'diff' column
cond = False
cond |= df['diff'].dt.seconds > 10 * 60
cond |= df['diff'].isnull()
df['tag'] = np.where(cond, 1, 0)
df['tag'] = df['tag'].cumsum().fillna(method = 'ffill')
print('#'*10, 'step2', '#'*10)
print(df)
# step 3, use 'tag' to judge to stop the while loop or not
# tag should be unique
break_sign = df.tag.duplicated().sum()
if break_sign == 0:
break
print('#'*10, 'step3', '#'*10)
print(break_sign)
# step 4:
# create a 'DatetimeX_1st' with the 'tag' group's first DatetimeX
# create a 'diff2' = 'DatetimeX' - 'DatetimeX_1st'
df2 = df.reset_index().set_index('tag')
df2['DatetimeX_1st'] = df.groupby('tag').first()['DatetimeX']
df2['diff2'] = df2['DatetimeX'] - df2['DatetimeX_1st']
print('#'*10, 'step4', '#'*10)
print(df2)
# step 5:
# drop the True < 10min gaps records
# 'diff' and 'diff2' should all < 10min
cond = (df2['diff2'].dt.seconds < 10 * 60) & (df2['diff'].dt.seconds < 10 * 60)
df3 = df2[~cond].copy()
print('#'*10, 'step5', '#'*10)
print(df3)
# step 6:
# reset index
cols = 'tag DatetimeX ID Type'.split()
df4 = df3.reset_index().set_index('index').sort_index()[cols]
print('#'*10, 'step6', '#'*10)
print(df4)
n_round += 1
print()
# get result
result = df[['DatetimeX', 'ID', 'Type']].copy()
result.index.name = None
print()
print('#'*10, 'result', '#'*10)
print(result)
输出:
#################### round 1 ####################
########## step1 ##########
DatetimeX ID Type diff
11 2020-01-01 02:00:01 12211 C NaT
0 2020-01-01 02:00:01 12345 C NaT
1 2020-01-01 02:00:03 12345 C 0 days 00:00:02
5 2020-01-01 02:00:09 12345 C 0 days 00:00:06
6 2020-01-01 02:01:35 12345 C 0 days 00:01:26
7 2020-01-01 02:10:35 12345 C 0 days 00:09:00
2 2020-01-01 05:00:03 12345 C 0 days 02:49:28
3 2020-01-01 05:03:05 12345 C 0 days 00:03:02
8 2020-01-01 02:00:01 13333 D NaT
9 2020-01-01 02:05:35 13333 D 0 days 00:05:34
4 2020-01-01 03:00:09 13333 D 0 days 00:54:34
10 2020-01-01 02:00:50 13333 E NaT
12 2020-01-01 02:09:50 13333 E 0 days 00:09:00
13 2020-01-01 02:11:50 13333 E 0 days 00:02:00
########## step2 ##########
DatetimeX ID Type diff tag
11 2020-01-01 02:00:01 12211 C NaT 1
0 2020-01-01 02:00:01 12345 C NaT 2
1 2020-01-01 02:00:03 12345 C 0 days 00:00:02 2
5 2020-01-01 02:00:09 12345 C 0 days 00:00:06 2
6 2020-01-01 02:01:35 12345 C 0 days 00:01:26 2
7 2020-01-01 02:10:35 12345 C 0 days 00:09:00 2
2 2020-01-01 05:00:03 12345 C 0 days 02:49:28 3
3 2020-01-01 05:03:05 12345 C 0 days 00:03:02 3
8 2020-01-01 02:00:01 13333 D NaT 4
9 2020-01-01 02:05:35 13333 D 0 days 00:05:34 4
4 2020-01-01 03:00:09 13333 D 0 days 00:54:34 5
10 2020-01-01 02:00:50 13333 E NaT 6
12 2020-01-01 02:09:50 13333 E 0 days 00:09:00 6
13 2020-01-01 02:11:50 13333 E 0 days 00:02:00 6
########## step3 ##########
8
########## step4 ##########
index DatetimeX ID Type diff \
tag
1 11 2020-01-01 02:00:01 12211 C NaT
2 0 2020-01-01 02:00:01 12345 C NaT
2 1 2020-01-01 02:00:03 12345 C 0 days 00:00:02
2 5 2020-01-01 02:00:09 12345 C 0 days 00:00:06
2 6 2020-01-01 02:01:35 12345 C 0 days 00:01:26
2 7 2020-01-01 02:10:35 12345 C 0 days 00:09:00
3 2 2020-01-01 05:00:03 12345 C 0 days 02:49:28
3 3 2020-01-01 05:03:05 12345 C 0 days 00:03:02
4 8 2020-01-01 02:00:01 13333 D NaT
4 9 2020-01-01 02:05:35 13333 D 0 days 00:05:34
5 4 2020-01-01 03:00:09 13333 D 0 days 00:54:34
6 10 2020-01-01 02:00:50 13333 E NaT
6 12 2020-01-01 02:09:50 13333 E 0 days 00:09:00
6 13 2020-01-01 02:11:50 13333 E 0 days 00:02:00
DatetimeX_1st diff2
tag
1 2020-01-01 02:00:01 0 days 00:00:00
2 2020-01-01 02:00:01 0 days 00:00:00
2 2020-01-01 02:00:01 0 days 00:00:02
2 2020-01-01 02:00:01 0 days 00:00:08
2 2020-01-01 02:00:01 0 days 00:01:34
2 2020-01-01 02:00:01 0 days 00:10:34
3 2020-01-01 05:00:03 0 days 00:00:00
3 2020-01-01 05:00:03 0 days 00:03:02
4 2020-01-01 02:00:01 0 days 00:00:00
4 2020-01-01 02:00:01 0 days 00:05:34
5 2020-01-01 03:00:09 0 days 00:00:00
6 2020-01-01 02:00:50 0 days 00:00:00
6 2020-01-01 02:00:50 0 days 00:09:00
6 2020-01-01 02:00:50 0 days 00:11:00
########## step5 ##########
index DatetimeX ID Type diff \
tag
1 11 2020-01-01 02:00:01 12211 C NaT
2 0 2020-01-01 02:00:01 12345 C NaT
2 7 2020-01-01 02:10:35 12345 C 0 days 00:09:00
3 2 2020-01-01 05:00:03 12345 C 0 days 02:49:28
4 8 2020-01-01 02:00:01 13333 D NaT
5 4 2020-01-01 03:00:09 13333 D 0 days 00:54:34
6 10 2020-01-01 02:00:50 13333 E NaT
6 13 2020-01-01 02:11:50 13333 E 0 days 00:02:00
DatetimeX_1st diff2
tag
1 2020-01-01 02:00:01 0 days 00:00:00
2 2020-01-01 02:00:01 0 days 00:00:00
2 2020-01-01 02:00:01 0 days 00:10:34
3 2020-01-01 05:00:03 0 days 00:00:00
4 2020-01-01 02:00:01 0 days 00:00:00
5 2020-01-01 03:00:09 0 days 00:00:00
6 2020-01-01 02:00:50 0 days 00:00:00
6 2020-01-01 02:00:50 0 days 00:11:00
########## step6 ##########
tag DatetimeX ID Type
index
0 2 2020-01-01 02:00:01 12345 C
2 3 2020-01-01 05:00:03 12345 C
4 5 2020-01-01 03:00:09 13333 D
7 2 2020-01-01 02:10:35 12345 C
8 4 2020-01-01 02:00:01 13333 D
10 6 2020-01-01 02:00:50 13333 E
11 1 2020-01-01 02:00:01 12211 C
13 6 2020-01-01 02:11:50 13333 E
#################### round 2 ####################
########## step1 ##########
DatetimeX ID Type diff
index
11 2020-01-01 02:00:01 12211 C NaT
0 2020-01-01 02:00:01 12345 C NaT
7 2020-01-01 02:10:35 12345 C 0 days 00:10:34
2 2020-01-01 05:00:03 12345 C 0 days 02:49:28
8 2020-01-01 02:00:01 13333 D NaT
4 2020-01-01 03:00:09 13333 D 0 days 01:00:08
10 2020-01-01 02:00:50 13333 E NaT
13 2020-01-01 02:11:50 13333 E 0 days 00:11:00
########## step2 ##########
DatetimeX ID Type diff tag
index
11 2020-01-01 02:00:01 12211 C NaT 1
0 2020-01-01 02:00:01 12345 C NaT 2
7 2020-01-01 02:10:35 12345 C 0 days 00:10:34 3
2 2020-01-01 05:00:03 12345 C 0 days 02:49:28 4
8 2020-01-01 02:00:01 13333 D NaT 5
4 2020-01-01 03:00:09 13333 D 0 days 01:00:08 6
10 2020-01-01 02:00:50 13333 E NaT 7
13 2020-01-01 02:11:50 13333 E 0 days 00:11:00 8
########## result ##########
DatetimeX ID Type
11 2020-01-01 02:00:01 12211 C
0 2020-01-01 02:00:01 12345 C
7 2020-01-01 02:10:35 12345 C
2 2020-01-01 05:00:03 12345 C
8 2020-01-01 02:00:01 13333 D
4 2020-01-01 03:00:09 13333 D
10 2020-01-01 02:00:50 13333 E
13 2020-01-01 02:11:50 13333 E