R - 根据条件有效计算记录之间经过的时间

R - Efficiently calculating the time elapsed between records based on condition

我有一个由位置跟踪系统的监视器读数组成的数据集。不幸的是,我不够熟练,无法随机复制它,所以这里是前几条记录:

  Time               TagID   MonitorID  Location
2017-10-31 23:03:26 1427435   1352303    A4.18
2017-10-31 23:06:02 1427435   1352303    A4.18
2017-10-31 23:06:20 1427435   1352303    A4.18
2017-10-31 23:06:50 1427435   1352303    A4.18
2017-10-31 23:06:51 1427435   1352303    A4.18
2017-10-31 23:07:20 1427435   1352303    A4.18
                      .
                      .
                      .
2017-11-22 22:29:55 1427435   1349044    B6.24
2017-11-22 22:30:22 1427435    286748    B6.41
2017-11-22 22:30:25 1427435   1349044    B6.24
2017-11-22 22:30:40 1427435    286748    B6.41
2017-11-22 22:30:41 1427435    286748    B6.41
2017-11-22 22:30:55 1427435   1349044    B6.24

我试图通过查看 MonitorID 读数发生变化之前经过的时间来确定 RFID 标签在特定监视器位置花费的时间。我通过我写的这个函数来做到这一点:

elapsed_time <- function(x) {
  # Prepare variables
  current_monitor <- x$MonitorID[1]
  start_time <- x$Time[1]
  end_time <- NULL
  output <- data.frame("Date" = as.POSIXct(as.character()), "MonitorID" = as.integer(), 
                      "Minutes_elapsed" = as.integer())
  # For loop to iterate over rows
  for (i in 1:nrow(x)) {
    # if the new monitor is the same as the old one then go to next iteration
    # otherwise calculate the time between dates, add values to output
    if (x$MonitorID[i] == current_monitor & i != nrow(x)) {
      next
    } else {
      # Mark what the time is when the location changes
      end_time <- x$Time[i]
      # Calculate time difference
      time_spent <- difftime(end_time, start_time, units = "mins")
      # Create temporary data frame to append to output
      temp <- data.frame(start_time, current_monitor, time_spent)
      # Append temp to output
      output <- rbind(output, setNames(temp, names(output)))
      # Set the new start time to the current time
      start_time <- end_time
      # Set the current monitor tracker to the new monitor
      current_monitor <- x$MonitorID[i]
    }
  }
  # Add monitor mappings to output
  output <- left_join(output, Mmappings[,c(1,2)], by="MonitorID")
  return(output)
}

最后一行可以忽略,只是将实际位置名称重新映射到MonitorID读数。此功能可以按预期工作,但是对于一个显示器来说 运行 需要很长时间(约 4 分钟),我想在另一个功能中同时将它与大约 95 个显示器一起使用。我确信有一种更有效的方法来编写此函数以减少所花费的时间。

编辑:这里是一些请求的示例输出:

  Date                MonitorID Minutes_elapsed   Location
1 2017-10-31 23:03:26   1352303 3.36666667 mins    A4.18
2 2017-10-31 23:06:48         0 0.03333333 mins    A4.20
3 2017-10-31 23:06:50   1352303 0.45000000 mins    A4.18
4 2017-10-31 23:07:17         0 0.05000000 mins    A4.20
5 2017-10-31 23:07:20   1352303 0.45000000 mins    A4.18
6 2017-10-31 23:07:47         0 0.05000000 mins    A4.20

在这种情况下,更改之间的时间很短,因为有时读数会反弹到其他显示器,但这无关紧要。

我会尝试建立一个示例数据框

df1<-data.frame(Time=c("2017-10-31 23:03:26","2017-10-31 23:06:02","2017-10-31 23:06:20","2017-10-31 23:06:50","2017-10-31 23:06:51",
                   "2017-10-31 23:07:20"),TagID=c(1427435,1427435,1427435,1427435,1427435,1427435),
           MonitorID=c(1352303,1352303,1352303,1352303,1352303,1352303),Location=c("A4.18","A4.18","A4.18","A4.18","A4.18","A4.18"))

df1$Time<-ymd_hms(df1$Time)
df2<-df1
df2$Time=df2$Time+minutes(30)
df2$MonitorID=df2$MonitorID+1
df2$Location<-"A4.19"
df<-rbind(df1,df2)

因此,如果您的数据框与上面类似,您可以使用以下代码计算每个监视器 ID 的经过时间(以分钟为单位):

result<-df%>%group_by(MonitorID)%>%summarize(ElapsedTime=difftime(tail(Time,1),head(Time,1)))

这有帮助吗?

    library(tidyverse) # for easy data manipulation
    library(lubridate) # for dealing with dates

    # create the sample data
    myDf <- frame_data(
        ~Time,               ~TagID,   ~MonitorID,  ~Location,
        "2017-10-31 23:03:26", 1427435,   1352303,    "A4.18",
        "2017-10-31 23:06:02", 1427435,   1352303,    "A4.18",
        "2017-10-31 23:06:20", 1427435,   1352303,    "A4.18",
        "2017-10-31 23:06:50", 1427435,   1352303,    "A4.18",
        "2017-10-31 23:06:51", 1427435,   1352303,    "A4.18",
        "2017-10-31 23:07:20", 1427435,   1352303,    "A4.18",
        "2017-11-22 22:29:55", 1427435,   1349044,    "B6.24",
        "2017-11-22 22:30:22", 1427435,    286748,    "B6.41",
        "2017-11-22 22:30:25", 1427435,   1349044,    "B6.24",
        "2017-11-22 22:30:40", 1427435,    286748,    "B6.41",
        "2017-11-22 22:30:41", 1427435,    286748,    "B6.41",
        "2017-11-22 22:30:55", 1427435,   1349044,    "B6.24"
    )

    # make times times
    # and (important!) sort the dataframe
    myDf <- myDf %>%
        mutate(Time = as_datetime(Time)) %>%
        arrange(TagID, Time)

    myDf %>%
        mutate(priorIDtheSame = MonitorID == lag(MonitorID)) %>%
        mutate(priorIDtheSame = replace(priorIDtheSame, is.na(priorIDtheSame), FALSE)) %>%
        mutate(nextIDtheSame = MonitorID == lead(MonitorID)) %>%
        mutate(nextIDtheSame = replace(nextIDtheSame, is.na(nextIDtheSame), FALSE)) %>%
        # we simply remove all the rows inbetween first and last at one location
        filter(!(priorIDtheSame & nextIDtheSame)) %>%
        # calculate the time difference
        mutate(timeAtThisLocation = Time - lag(Time)) %>%
        # and make sure it is only calculated were we need it
        mutate(timeAtThisLocation = replace(timeAtThisLocation, !priorIDtheSame, NA))

结果是

    # A tibble: 8 x 7
                     Time   TagID MonitorID Location priorIDtheSame nextIDtheSame timeAtThisLocation
                   <dttm>   <dbl>     <dbl>    <chr>          <lgl>         <lgl>             <time>
    1 2017-10-31 22:03:26 1427435   1352303    A4.18          FALSE          TRUE            NA secs
    2 2017-10-31 22:07:20 1427435   1352303    A4.18           TRUE         FALSE           234 secs
    3 2017-11-22 21:29:55 1427435   1349044    B6.24          FALSE         FALSE            NA secs
    4 2017-11-22 21:30:22 1427435    286748    B6.41          FALSE         FALSE            NA secs
    5 2017-11-22 21:30:25 1427435   1349044    B6.24          FALSE         FALSE            NA secs
    6 2017-11-22 21:30:40 1427435    286748    B6.41          FALSE          TRUE            NA secs
    7 2017-11-22 21:30:41 1427435    286748    B6.41           TRUE         FALSE             1 secs
    8 2017-11-22 21:30:55 1427435   1349044    B6.24          FALSE         FALSE            NA secs