在 R 中按组计算滚动 12 小时

Calculate Rolling 12 Hours by Group in R

我正在做一个项目,我必须只包括至少间隔 12 小时进行实验室测试的患者,并保留每个包含的实验室测试的时间戳。问题是许多患者在 12 小时内完成了几个实验室 window,但客户要求不包括这些测试。我已经做到了这一点:

#Create dummy dataset
df = data.frame(
  "Encounter" = c(rep("12345", times=16), rep("67890", times = 5)),
  "Timestamp" = c("01/06/2022 04:00:00", "01/07/2022 08:00:00",
                   "01/08/2022 00:00:00", "01/08/2022 04:00:00",
                   "01/08/2022 08:00:00", "01/08/2022 20:00:00",
                   "01/09/2022 04:00:00", "01/09/2022 08:00:00",
                   "01/09/2022 20:00:00", "01/09/2022 23:26:00",
                   "01/10/2022 00:00:00", "01/10/2022 08:00:00",
                   "01/10/2022 20:00:00", "01/11/2022 00:00:00",
                   "01/11/2022 20:00:00", "01/12/2022 04:00:00",
                   "11/10/2021 11:00:00", "11/10/2021 12:00:00",
                   "11/10/2021 13:00:00", "11/10/2021 14:00:00",
                   "11/11/2021 00:00:00"))

#Convert timestamp to POSIXlt format
df$Timestamp <- strptime(as.character(df$Timestamp), format="%m/%d/%Y %H:%M")

#Calculate time (in hours) between each previous timestamp by Encounter
df <- df %>% 
  group_by(Encounter) %>% 
  arrange(Encounter, Timestamp) %>% 
  mutate(difftime(Timestamp, lag(Timestamp), units="hours"))

我似乎不知道下一步该做什么。似乎我需要计算一个滚动的 12 小时,然后在连续达到 12 小时后重置为 0,但我不确定如何去做。以下是我的理想结果:

df$Keep.Row <- c(1,1,1,0,0,1,0,1,1,0,0,1,1,0,1,0,1,0,0,0,1)

可能遗漏了什么,但这行不通:

library(dplyr)

df %>% 
  group_by(Encounter) %>% 
  arrange(Encounter, Timestamp) %>% 
  mutate(time_dif = difftime(Timestamp, lag(Timestamp), units="hours")) %>% 
  filter(time_dif > 12)

这绝对没有什么优雅之处,但我相信它能满足您的需求。我使用一个临时变量来存储“滚动”总和,然后一旦两者之间的时间为 12 小时或更长,它就会被重置。

library(tidyverse)
df <- df %>% 
  group_by(Encounter) %>% 
  arrange(Encounter, Timestamp) %>% 
  mutate(time_diff = difftime(Timestamp, lag(Timestamp), units="hours")) %>%
  replace_na(list(time_diff = 0)) %>%
  mutate(temp = ifelse(time_diff < 12 & lag(time_diff) >= 12, time_diff, lag(time_diff) + time_diff),
         temp = ifelse(is.na(temp), 0, temp),
         hours_between = ifelse(time_diff >= 12, time_diff,
                        ifelse(time_diff < 12 & lag(time_diff) >= 12, time_diff, lag(temp) + time_diff)),
         keep = ifelse(hours_between >= 12 | is.na(hours_between), 1, 0)) %>%
  select(-temp)

reprex package (v2.0.1)

于 2022-01-27 创建

这是使用 accumulate 的替代选项。在这里,您可以使用差异,一旦它们超过 12 小时的阈值,只需使用 diff 值(重新开始)而不是使用累计总和来重置。要包括每个 Encounter 的第一次,您可以将 diff 设置为 12 小时,或者添加一个单独的 mutate 并检查 Timestamp == first(Timestamp) 的位置,在这些情况下设置 keep 到 1.

library(tidyverse)

thresh <- 12

df %>%
  group_by(Encounter) %>% 
  arrange(Encounter, Timestamp) %>% 
  mutate(diff = difftime(Timestamp, lag(Timestamp, default = first(Timestamp) - (thresh * 60 * 60)), units = "hours"),
         keep = +(accumulate(diff, ~if_else(.x >= thresh, .y, .x + .y)) >= thresh))

输出

   Encounter Timestamp           diff              keep
   <chr>     <dttm>              <drtn>           <int>
 1 12345     2022-01-06 04:00:00 12.0000000 hours     1
 2 12345     2022-01-07 08:00:00 28.0000000 hours     1
 3 12345     2022-01-08 00:00:00 16.0000000 hours     1
 4 12345     2022-01-08 04:00:00  4.0000000 hours     0
 5 12345     2022-01-08 08:00:00  4.0000000 hours     0
 6 12345     2022-01-08 20:00:00 12.0000000 hours     1
 7 12345     2022-01-09 04:00:00  8.0000000 hours     0
 8 12345     2022-01-09 08:00:00  4.0000000 hours     1
 9 12345     2022-01-09 20:00:00 12.0000000 hours     1
10 12345     2022-01-09 23:26:00  3.4333333 hours     0
11 12345     2022-01-10 00:00:00  0.5666667 hours     0
12 12345     2022-01-10 08:00:00  8.0000000 hours     1
13 12345     2022-01-10 20:00:00 12.0000000 hours     1
14 12345     2022-01-11 00:00:00  4.0000000 hours     0
15 12345     2022-01-11 20:00:00 20.0000000 hours     1
16 12345     2022-01-12 04:00:00  8.0000000 hours     0
17 67890     2021-11-10 11:00:00 12.0000000 hours     1
18 67890     2021-11-10 12:00:00  1.0000000 hours     0
19 67890     2021-11-10 13:00:00  1.0000000 hours     0
20 67890     2021-11-10 14:00:00  1.0000000 hours     0
21 67890     2021-11-11 00:00:00 10.0000000 hours     1