计算午夜后 IN 和 OUT 事件之间的时间增量

Calc of timedelta between IN and OUT events after midnight

下面是沙箱的代码,-原来明显比较大...

主要问题是,- 如果注销是在午夜之后完成的,我无法弄清楚如何计算 timedelta,但是,事件应该被视为发生在前一天...

示例,登录时间为@22/03/2022 18:00:00,注销时间为@23/03/202201:00:00 结果应显示时间差 7 小时,记录日 - 星期二,日期 - 22/03/2022。

目前,我正在考虑创建新的日期列并在其中偏移 4 小时 df['NewDate'] = (pd.to_datetime(df['LogDateTime']) - timedelta(hours=4)).dt.strftime('%d/%m/%Y') 稍后使用新日期拆分和合并,但我不确定这是否是前进的最佳方式...

任何想法或指示都会非常有帮助... (提前致谢)

import pandas as pd
data = [[515, '2022-03-16 17:01:11',    'In', 'Wed', '16/03/2022'],
        [515, '2022-03-17 00:16:36',    'Out', 'Thu', '17/03/2022'],
        [515, '2022-03-17 00:16:49',    'Out', 'Thu', '17/03/2022'],
        [515, '2022-03-17 16:42:40',    'In', 'Thu', '17/03/2022'],
        [515, '2022-03-17 23:48:09',    'Out', 'Thu', '17/03/2022'],
        [333, '2022-03-17 17:16:36',    'In', 'Thu', '17/03/2022'],
        [333, '2022-03-17 17:16:45',    'In', 'Thu', '17/03/2022'],
        [333, '2022-03-17 17:16:51',    'In', 'Thu', '17/03/2022'],
        [333, '2022-03-17 22:55:03',    'Out', 'Thu', '17/03/2022']]

df = pd.DataFrame(data, columns=['ID', 'LogDateTime', 'EventType', 'Log Day', 'Date'])
df.LogDateTime = pd.to_datetime(df.LogDateTime)
df.dtypes

df = df.sort_values(by=['LogDateTime', 'ID'])

dfIn = df.loc[(df.EventType == "In")]
dfOut = df.loc[(df.EventType == "Out")]

# dfIn = dfIn.drop_duplicates(subset=['ID', 'Date', 'Log Day'], keep='first') # Cannot be used due to midnight issue...
# dfOut = dfIn.drop_duplicates(subset=['ID', 'Date', 'Log Day'], keep='first') # Cannot be used due to midnight issue...

FinDF = pd.merge(dfIn, dfOut, on=['ID', 'Log Day', 'Date'], how='left')
FinDF.rename(columns={'LogDateTime_x': 'Log In', 'LogDateTime_y': 'Log Out'}, inplace=True)

FinDF['Hours'] = round((FinDF['Log Out'] - FinDF['Log In']).astype('timedelta64[m]') / 60, 2)
FinDF.to_excel(r'C:\Test\test.xlsx', index=False)

根据上面的示例,数据有很多重复事件,我尝试在 Out 事件中使用最大值,在 In 事件中使用相反值,- 但是由于午夜问题,- 我丢失了数据..

最终结果应如下所示:

在合并 DataFrame 之前,当连续出现时,您必须只保留第一个“In”和最后一个“Out”。然后,您可以继续合并两个数据帧并查找时差的现有逻辑。

尝试:

df = df.sort_values(by=['ID','LogDateTime'])

groups = ((df["ID"].eq(df["ID"].shift())&df["EventType"].ne(df["EventType"].shift()))|(df["ID"].ne(df["ID"].shift()))).cumsum()

#minimum timestamp to be used for EventType = "In"
mins = df.groupby(groups)["LogDateTime"].transform("min")

#maximum timestamp to be used for EventType = "In"
maxs = df.groupby(groups)["LogDateTime"].transform("max")

condensed = df[df["LogDateTime"].eq(mins.where(df["EventType"].eq("In"), maxs))].reset_index(drop=True)
condensed["Group"] = condensed.index//2

FinDF = condensed[condensed["EventType"].eq("In")].merge(condensed.loc[condensed["EventType"].eq("Out"), ["LogDateTime", "Group"]], on="Group").drop(["EventType", "Group"], axis=1)
FinDF = FinDF.rename(columns={'LogDateTime_x': 'Log In', 'LogDateTime_y': 'Log Out'})
FinDF['Hours'] = FinDF["Log Out"].sub(FinDF["Log In"]).dt.total_seconds().div(3600).round(2)

>>> FinDF
    ID              Log In Log Day        Date             Log Out  Hours
0  333 2022-03-17 17:16:36     Thu  17/03/2022 2022-03-17 22:55:03   5.64
1  515 2022-03-16 17:01:11     Wed  16/03/2022 2022-03-17 00:16:49   7.26
2  515 2022-03-17 16:42:40     Thu  17/03/2022 2022-03-17 23:48:09   7.09