如何使用for循环根据日志文件的posixct之间的差异创建新变量

How to use a for-loop to create a new variable based on differences between posixct for log files

我正在尝试遍历我拥有的日志文件数据集,以添加一个变量,其中为每次观察存储服务器会话号。对于第一行,我想创建一个值为 1 的新变量 'session number'。之后,如果 'ResearchNumber' 与之前的行不同,我想为下一行使用不同的会话号。如果相同'ResearchNumber',我想检查Posixct变量中的差异是否大于18000秒(或30分钟)。如果是这种情况,我想创建一个不同的会话号(通过将其增加 1)。在所有其他情况下,我希望会话号与上一行相同。总而言之,我想根据每个参与者超过 30 分钟的不活动时间来创建会话编号。

我已经尝试了几种方法,但我的代码似乎并没有遍历所有行,而且对于其他解决方案,时差的计算方式不正确。

希望有人能帮我解决这个问题。感谢所有帮助!


# create example data

ResearchNumber <- c("AL001","AL002","AL003")

DateTimeTag <- c(
  as.POSIXct('2014-09-29 10:35:40', tz='GMT'),
  as.POSIXct('2014-09-29 10:35:42', tz='GMT'),
  as.POSIXct('2014-09-29 10:38:18', tz='GMT')
)

logdata <- data.frame(ResearchNumber, DateTimeTag)


# loop through logdata to add variable to every observation with a server session number

linecount <- 1
for (lines in logdata) {
  if (linecount == 1) {
    session_number <- 1
    logdata$session_number <- session_number
    datetime <- logdata$DateTimeTag
    participantbefore <- logdata$ResearchNumber
    linecount <- (linecount + 1)
  } 
  else if (linecount > 1) {
    difference <- (logdata$DateTimeTag - datetime)
    if (logdata$ResearchNumber != participantbefore) {
      logdata$session_number <- (session_number + 1)
      participantbefore <- logdata$ResearchNumber
      session_number <- (session_number + 1)
      datetime <- logdata$DateTimeTag
    }
    else if (difference > 18000) {
      logdata$session_number <- (session_number + 1)
      participantbefore <- logdata$ResearchNumber
      session_number <- (session_number + 1)
      datetime <- logdata$DateTimeTag
    }
    else {
      logdata$session_number <- (session_number)
      participantbefore <- logdata$ResearchNumber
      datetime <- logdata$DateTimeTag
    }
  }
}

你赢了我@docendo discimus!

这是一个 dplyr 解决方案。

library(tidyverse) # brings in dplyr library

# make better example data
ResearchNumber <- c("AL001","AL002","AL003", "AL003", "AL003")

DateTimeTag <- c(
  as.POSIXct('2014-09-29 10:35:40', tz='GMT'),
  as.POSIXct('2014-09-29 10:35:42', tz='GMT'),
  as.POSIXct('2014-09-29 10:38:18', tz='GMT'),
  as.POSIXct('2014-09-29 12:00:00', tz='GMT'),
  as.POSIXct('2014-09-29 12:15:18', tz='GMT')
)

logdata <- data.frame(ResearchNumber, DateTimeTag)

logdata

logdata <- logdata %>% 
  arrange(ResearchNumber) %>% 
  group_by(ResearchNumber) %>% 
  mutate(difftime = difftime(DateTimeTag, lag(DateTimeTag), units = "mins"),
         DiffSess = case_when(
           is.na(difftime) ~ TRUE,
           difftime > 30 ~ TRUE,
           TRUE ~ FALSE)) %>% 
  ungroup() %>% 
  mutate(session_number = cumsum(DiffSess))

结果

  ResearchNumber DateTimeTag         session_number difftime  DiffSess
  <fct>          <dttm>                       <int> <drtn>    <lgl>   
1 AL001          2014-09-29 10:35:40              1   NA mins TRUE    
2 AL002          2014-09-29 10:35:42              2   NA mins TRUE    
3 AL003          2014-09-29 10:38:18              3   NA mins TRUE    
4 AL003          2014-09-29 12:00:00              4 81.7 mins TRUE    
5 AL003          2014-09-29 12:15:18              4 15.3 mins FALSE