从数据中剥离时间部分并使用 R 计算连续行之间的时间差
Strip the time part from data and calculate time difference between consecutive rows using R
我有一个 excel 文件,其中有一列 'Time' 代表时间。该列的数据类型为 POSIXct。当我在 R 中加载 excel 文件时,一些随机日期部分被附加到时间上,所以我想删除附加的随机日期,只保留时间部分,然后根据分组计算连续行之间的差异 Emp_Id 和日期列,我需要在其中查看每位员工每天上班和下班时间的差异。
这是我的数据加载到 R 中并添加了随机日期时的样子。
| Emp_Id | Date | Time | Time_Event |
|--------|:---------:|---------------------:|------------|
| 95 | 3/14/2019 | 1899-12-31 10:47:12 | Clock-In |
| 95 | 3/12/2019 | 1899-12-31 10:51:12 | Clock-In |
| 95 | 3/11/2019 | 1899-12-31 8:15:16 | Clock-Out |
| 95 | 3/12/2019 | 1899-12-31 8:10:07 | Clock-Out |
| 95 | 3/11/2019 | 1899-12-31 10:41:51 | Clock-In |
| 19 | 3/14/2019 | 1899-12-31 6:02:23 | Clock-Out |
| 19 | 3/18/2019 | 1899-12-31 5:44:23 | Clock-In |
| 19 | 3/12/2019 | 1899-12-31 6:05:15 | Clock-Out |
| 19 | 3/12/2019 | 1899-12-31 5:45:57 | Clock-In |
| 19 | 3/14/2019 | 1899-12-31 5:29:32 | Clock-In |
为方便起见,数据为:
Emp_Id <- as.numeric(c("95", "95", "95", "95", "95", "19", "19", "19", "19", "19"))
Date <- c("3/14/2019","3/12/2019","3/11/2019", "3/12/2019","3/11/2019","3/14/2019","3/18/2019","3/12/2019","3/12/2019","3/14/2019")
Time <- as.POSIXct(c("1899-12-31 10:47:12", "1899-12-31 10:51:12", "1899-12-31 8:15:16","1899-12-31 8:10:07", "1899-12-31 10:41:51",
"1899-12-31 6:02:23", "1899-12-31 5:44:23", "1899-12-31 6:05:15", "1899-12-31 5:45:57","1899-12-31 5:29:32"))
Time_Event <- c("Clock-In","Clock-In","Clock-Out","Clock-Out","Clock-In","Clock-Out","Clock-In","Clock-Out","Clock-In","Clock-In")
df2 <- data.frame(Emp_Id,Date,Time,Time_Event, stringsAsFactors = F)
df2$Date= as.Date(df2$Date, format = "%m/%d/%Y")
使用df$Time <- format(strptime(df$Time, "%Y-%m-%d %H:%M:%S"), "%H:%M:%S")
去除时间部分但将数据类型转换为字符。因为我需要计算差异,所以我不能对字符数据类型执行此操作。我经历过这个 link ,但这没有帮助。
我尝试了下面的代码,但由于字符数据类型而出现错误
df2 <- df2 %>%
arrange(df2$Emp_Id, df2$Date, df2$Time) %>%
group_by(df2$Emp_Id,df2$Date) %>%
mutate(diff = format(strptime(df2$Time, "%Y-%m-%d %H:%M:%S"),"%H:%M:%S")-
lag(format(strptime(df2$Time, "%Y-%m-%d %H:%M:%S"),"%H:%M:%S"),
default = format(strptime(df2$Time, "%Y-%m-%d %H:%M:%S"),"%H:%M:%S")[1]),
diff_secs = as.numeric(diff, units = 'secs'))
我怎样才能使最终输出看起来像:
| Emp_Id | Date | Time | Time_Event | Diff(In seconds) |
|--------|:---------:|---------:|------------|------------------|
| 19 | 3/12/2019 | 5:45:57 | Clock-In | NA |
| 19 | 3/12/2019 | 18:05:15 | Clock-Out | 44358 |
| 19 | 3/14/2019 | 5:29:32 | Clock-In | NA |
| 19 | 3/14/2019 | 18:02:23 | Clock-Out | 45171 |
| 19 | 3/18/2019 | 17:44:23 | Clock-In | NA |
| 95 | 3/11/2019 | 10:41:51 | Clock-In | NA |
| 95 | 3/11/2019 | 20:15:16 | Clock-Out | 33844 |
| 95 | 3/12/2019 | 10:51:12 | Clock-In | NA |
| 95 | 3/12/2019 | 20:10:07 | Clock-Out | 33535 |
| 95 | 3/14/2019 | 10:47:12 | Clock-In | NA |
library(dplyr)
library(tidyr)
df2 %>%
arrange(Emp_Id, Date, Time) %>%
group_by(Emp_Id, Date) %>%
mutate(Diff = as.numeric(Time - lag(Time), units = "secs")) %>%
ungroup()
我们可以使用
library(data.table)
setDT(df1)[order(Emp_Id, Date, Time), Date :=
as.numeric(Time - shift(Time)), .(Emp_Id, Date)]
我有一个 excel 文件,其中有一列 'Time' 代表时间。该列的数据类型为 POSIXct。当我在 R 中加载 excel 文件时,一些随机日期部分被附加到时间上,所以我想删除附加的随机日期,只保留时间部分,然后根据分组计算连续行之间的差异 Emp_Id 和日期列,我需要在其中查看每位员工每天上班和下班时间的差异。
这是我的数据加载到 R 中并添加了随机日期时的样子。
| Emp_Id | Date | Time | Time_Event |
|--------|:---------:|---------------------:|------------|
| 95 | 3/14/2019 | 1899-12-31 10:47:12 | Clock-In |
| 95 | 3/12/2019 | 1899-12-31 10:51:12 | Clock-In |
| 95 | 3/11/2019 | 1899-12-31 8:15:16 | Clock-Out |
| 95 | 3/12/2019 | 1899-12-31 8:10:07 | Clock-Out |
| 95 | 3/11/2019 | 1899-12-31 10:41:51 | Clock-In |
| 19 | 3/14/2019 | 1899-12-31 6:02:23 | Clock-Out |
| 19 | 3/18/2019 | 1899-12-31 5:44:23 | Clock-In |
| 19 | 3/12/2019 | 1899-12-31 6:05:15 | Clock-Out |
| 19 | 3/12/2019 | 1899-12-31 5:45:57 | Clock-In |
| 19 | 3/14/2019 | 1899-12-31 5:29:32 | Clock-In |
为方便起见,数据为:
Emp_Id <- as.numeric(c("95", "95", "95", "95", "95", "19", "19", "19", "19", "19"))
Date <- c("3/14/2019","3/12/2019","3/11/2019", "3/12/2019","3/11/2019","3/14/2019","3/18/2019","3/12/2019","3/12/2019","3/14/2019")
Time <- as.POSIXct(c("1899-12-31 10:47:12", "1899-12-31 10:51:12", "1899-12-31 8:15:16","1899-12-31 8:10:07", "1899-12-31 10:41:51",
"1899-12-31 6:02:23", "1899-12-31 5:44:23", "1899-12-31 6:05:15", "1899-12-31 5:45:57","1899-12-31 5:29:32"))
Time_Event <- c("Clock-In","Clock-In","Clock-Out","Clock-Out","Clock-In","Clock-Out","Clock-In","Clock-Out","Clock-In","Clock-In")
df2 <- data.frame(Emp_Id,Date,Time,Time_Event, stringsAsFactors = F)
df2$Date= as.Date(df2$Date, format = "%m/%d/%Y")
使用df$Time <- format(strptime(df$Time, "%Y-%m-%d %H:%M:%S"), "%H:%M:%S")
去除时间部分但将数据类型转换为字符。因为我需要计算差异,所以我不能对字符数据类型执行此操作。我经历过这个 link
我尝试了下面的代码,但由于字符数据类型而出现错误
df2 <- df2 %>%
arrange(df2$Emp_Id, df2$Date, df2$Time) %>%
group_by(df2$Emp_Id,df2$Date) %>%
mutate(diff = format(strptime(df2$Time, "%Y-%m-%d %H:%M:%S"),"%H:%M:%S")-
lag(format(strptime(df2$Time, "%Y-%m-%d %H:%M:%S"),"%H:%M:%S"),
default = format(strptime(df2$Time, "%Y-%m-%d %H:%M:%S"),"%H:%M:%S")[1]),
diff_secs = as.numeric(diff, units = 'secs'))
我怎样才能使最终输出看起来像:
| Emp_Id | Date | Time | Time_Event | Diff(In seconds) |
|--------|:---------:|---------:|------------|------------------|
| 19 | 3/12/2019 | 5:45:57 | Clock-In | NA |
| 19 | 3/12/2019 | 18:05:15 | Clock-Out | 44358 |
| 19 | 3/14/2019 | 5:29:32 | Clock-In | NA |
| 19 | 3/14/2019 | 18:02:23 | Clock-Out | 45171 |
| 19 | 3/18/2019 | 17:44:23 | Clock-In | NA |
| 95 | 3/11/2019 | 10:41:51 | Clock-In | NA |
| 95 | 3/11/2019 | 20:15:16 | Clock-Out | 33844 |
| 95 | 3/12/2019 | 10:51:12 | Clock-In | NA |
| 95 | 3/12/2019 | 20:10:07 | Clock-Out | 33535 |
| 95 | 3/14/2019 | 10:47:12 | Clock-In | NA |
library(dplyr)
library(tidyr)
df2 %>%
arrange(Emp_Id, Date, Time) %>%
group_by(Emp_Id, Date) %>%
mutate(Diff = as.numeric(Time - lag(Time), units = "secs")) %>%
ungroup()
我们可以使用
library(data.table)
setDT(df1)[order(Emp_Id, Date, Time), Date :=
as.numeric(Time - shift(Time)), .(Emp_Id, Date)]