如果时间超过一定量(R,Dplyr),则创建新部分并取时差
Create new section and take time difference if time exceeds a certain amount (R, Dplyr)
我有一个数据集 df,它有 10,000 行:
DateA
9/9/2019 7:52:16 PM
9/9/2019 7:52:16 PM
9/9/2019 7:52:17 PM
9/9/2019 7:52:18 PM
9/9/2019 7:52:18 PM
9/9/2019 7:52:19 PM
9/10/2019 1:02:23 AM
9/10/2019 1:02:25 AM
9/10/2019 1:02:26 AM
9/10/2019 1:02:27 AM
9/10/2019 1:02:27 AM
9/10/2019 1:02:29 AM
9/10/2019 1:02:29 AM
9/10/2019 1:03:29 AM
9/10/2019 1:03:29 AM
9/10/2019 1:03:31 AM
9/10/2019 1:03:32 AM
9/10/2019 4:18:48 AM
9/10/2019 4:18:50 AM
9/10/2019 4:18:51 AM
我想要这个输出:
Group Duration
a 3 sec
b 6 sec
c 3 sec
d 3 sec
我想将阈值设置为 1 分钟或 60 秒。如果检测到超过 60 秒的流逝,将创建一个新组及其持续时间。
输出:
structure(list(DateA = structure(c(12L, 12L, 13L, 14L, 14L, 15L,
1L, 2L, 3L, 4L, 4L, 5L, 5L, 6L, 6L, 7L, 8L, 9L, 10L, 11L), .Label = c("9/10/2019 1:02:23 AM",
"9/10/2019 1:02:25 AM", "9/10/2019 1:02:26 AM", "9/10/2019 1:02:27 AM",
"9/10/2019 1:02:29 AM", "9/10/2019 1:03:29 AM", "9/10/2019 1:03:31 AM",
"9/10/2019 1:03:32 AM", "9/10/2019 4:18:48 AM", "9/10/2019 4:18:50 AM",
"9/10/2019 4:18:51 AM", "9/9/2019 7:52:16 PM", "9/9/2019 7:52:17 PM",
"9/9/2019 7:52:18 PM", "9/9/2019 7:52:19 PM"), class = "factor")), class = "data.frame", row.names = c(NA,
-20L))
我试过:
thresh1 <-60
library(data.table)
setDT(df)[, DateA := as.ITime(as.character(DateA))][,
.(Duration = difftime(max(as.POSIXct(DateA)), min(as.POSIXct(DateA)),
unit = 'sec')),.(group = letters[cumsum(c(TRUE, diff(DateA) > thresh1))])]
但是,我做错了什么,因为我只得到 1 行的输出。
group Duration
a 0
不确定我做错了什么?任何建议表示赞赏。
我们可以将 DateA
转换为 POSIXct
class,format
它只包含精确到分钟的信息,并找出 max
和 min
每组持续时间。
library(dplyr)
df %>%
mutate(DateA = lubridate::dmy_hms(DateA),
temp = format(DateA, "%Y-%m-%d %H:%M")) %>%
group_by(temp) %>%
summarise(duration = difftime(max(DateA), min(DateA), units = "secs"))
# A tibble: 4 x 2
# temp duration
# <chr> <drtn>
#1 2019-09-09 19:52 3 secs
#2 2019-10-09 01:02 6 secs
#3 2019-10-09 01:03 3 secs
#4 2019-10-09 04:18 3 secs
我有一个数据集 df,它有 10,000 行:
DateA
9/9/2019 7:52:16 PM
9/9/2019 7:52:16 PM
9/9/2019 7:52:17 PM
9/9/2019 7:52:18 PM
9/9/2019 7:52:18 PM
9/9/2019 7:52:19 PM
9/10/2019 1:02:23 AM
9/10/2019 1:02:25 AM
9/10/2019 1:02:26 AM
9/10/2019 1:02:27 AM
9/10/2019 1:02:27 AM
9/10/2019 1:02:29 AM
9/10/2019 1:02:29 AM
9/10/2019 1:03:29 AM
9/10/2019 1:03:29 AM
9/10/2019 1:03:31 AM
9/10/2019 1:03:32 AM
9/10/2019 4:18:48 AM
9/10/2019 4:18:50 AM
9/10/2019 4:18:51 AM
我想要这个输出:
Group Duration
a 3 sec
b 6 sec
c 3 sec
d 3 sec
我想将阈值设置为 1 分钟或 60 秒。如果检测到超过 60 秒的流逝,将创建一个新组及其持续时间。
输出:
structure(list(DateA = structure(c(12L, 12L, 13L, 14L, 14L, 15L,
1L, 2L, 3L, 4L, 4L, 5L, 5L, 6L, 6L, 7L, 8L, 9L, 10L, 11L), .Label = c("9/10/2019 1:02:23 AM",
"9/10/2019 1:02:25 AM", "9/10/2019 1:02:26 AM", "9/10/2019 1:02:27 AM",
"9/10/2019 1:02:29 AM", "9/10/2019 1:03:29 AM", "9/10/2019 1:03:31 AM",
"9/10/2019 1:03:32 AM", "9/10/2019 4:18:48 AM", "9/10/2019 4:18:50 AM",
"9/10/2019 4:18:51 AM", "9/9/2019 7:52:16 PM", "9/9/2019 7:52:17 PM",
"9/9/2019 7:52:18 PM", "9/9/2019 7:52:19 PM"), class = "factor")), class = "data.frame", row.names = c(NA,
-20L))
我试过:
thresh1 <-60
library(data.table)
setDT(df)[, DateA := as.ITime(as.character(DateA))][,
.(Duration = difftime(max(as.POSIXct(DateA)), min(as.POSIXct(DateA)),
unit = 'sec')),.(group = letters[cumsum(c(TRUE, diff(DateA) > thresh1))])]
但是,我做错了什么,因为我只得到 1 行的输出。
group Duration
a 0
不确定我做错了什么?任何建议表示赞赏。
我们可以将 DateA
转换为 POSIXct
class,format
它只包含精确到分钟的信息,并找出 max
和 min
每组持续时间。
library(dplyr)
df %>%
mutate(DateA = lubridate::dmy_hms(DateA),
temp = format(DateA, "%Y-%m-%d %H:%M")) %>%
group_by(temp) %>%
summarise(duration = difftime(max(DateA), min(DateA), units = "secs"))
# A tibble: 4 x 2
# temp duration
# <chr> <drtn>
#1 2019-09-09 19:52 3 secs
#2 2019-10-09 01:02 6 secs
#3 2019-10-09 01:03 3 secs
#4 2019-10-09 04:18 3 secs