使用 dplyr 过滤 R 中某个时间间隔内的事件日志

Question

我有一个格式如下的事件日志。

Original format
我使用 dplyr 按日期和 ID 创建了组，因此日期或 ID 的更改将被视为不同的组。

我只想拥有 >= 5 秒时间间隔的事件，并删除其余的事件。 Desired output

我已经使用 dplyr 和时间滞后来实现这一点，因为我无法为此动态分配滞后间隔。但是我当前的代码检查一个滞后间隔，我最终删除了比预期更多的行。Current output - all rows in yellow are removed。理想情况下我想要“13:10:22”，第2组中的“13:10:24”要保留，因为从“13:10:17”到这些时间的时间差为5秒以上。

我正在使用"chron"来处理时间。我知道时滞逻辑在我的情况下不起作用。除了使用昂贵的 for/if 循环之外，还有更好的选择吗？

我用过的代码

data$Date <- as.Date(data$Date,format = "%m/%d/%Y")  
data$Time <- chron(times = data$Time)  

data <- data  %>% arrange(Date,Time,ID)    
data$Group <- data %>%  group_by(Date,ID) %>% group_indices    
data <- data %>%     
        group_by(Group)  %>%       
        mutate(time.difference = Time - lag(Time)) %>%    
        filter(time.difference >= 0.00005787 | is.na(time.difference))

Dput数据

结构（列表（日期=结构（c（17469，17469，17469，17469， 17469, 17469, 17469, 17469, 17469, 17469, 17469, 17469, 17469, 17469, 17469, 17470, 17470, 17470, 17470), class = "Date"), 时间 = 结构(c(0.936400462962963, 0.9425, 0.9425, 0.942511574074074, 0.942523148148148, 0.9703125, 0.548518518518519, 0.548530092592593, 0.54880787037037, 0.54880787037037, 0.548819444444444, 0.548842592592593, 0.548865740740741, 0.548888888888889, 0.557337962962963, 0.6140625, 0.618761574074074, 0.618958333333333, 0.622303240740741), 格式 = "h:m:s", class = "times"), ID = c("P1", "P1", "P1", "P1", "P1", "P1", "P5", "P5", "P5", "P5", "P5", "P5", "P5", "P5", "P5", "P9", "P9", "P9", "P9")), .Names = c("Date", "Time", "ID"), row.names = c(NA, -19L), class = "data.frame")

Answer 1

library(dplyr)
data %>%
  group_by(Group) %>%
  arrange(Group, Date, Time) %>% 
  filter((Time - lag(Time)) >= 5.787037e-05 | row_number() == 1L)

Answer 2

data$datetime <- as.POSIXct(paste(data$Date, data$Time), format="%m/%d/%Y %H:%M:%S")  
data$group <-  data %>% group_by(ID,by5sec=cut(datetime, breaks="5 sec")) %>%  group_indices
data_filter <- data %>% group_by(group) %>% filter(row_number()==1)

我分 2 个步骤完成了此操作，因为我希望将带有组索引的中间结果写入 CSV。

使用 dplyr 过滤 R 中某个时间间隔内的事件日志

Filter eventlogs that are within a time-interval in R using dplyr

cut

r

cumsum

dplyr

difftime