如何从特定行中查找前一行和后一行之间的时间差

How to find time difference between previous and following rows from specific rows

如果满足条件,我想计算特定行前后行的时间差。我不想找到顺序上的差异(第 3 行 - 第 2 行,第 4 行 - 第 3 行等),但想要与中间行的差异。也许另一种说法是距 0 的距离。

如果开始列显示为“y”,我希望该行的时间为原点,但仅持续 5 秒。我有大约 600,000 行大致连续的时间序列,因此在 start 的两侧计算 5 s 应该有希望做到这一点,这样计算就不会重叠。我什至不确定这在代码中会是什么样子。示例数据,为了方便省略了很多列:

df <- data.frame(
  stringsAsFactors = FALSE,
          initiate = c(0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 1L),
             start = c("no","no","yes","no","no",
                       "no","no","no","no","yes","no","no","no","no"),
              time = c(2.8225,2.82375,2.825,2.82625,
                       2.827,2.82725,16.8075,16.810,16.82,16.8212,16.825,
                       16.8262,16.8275,16.8300)
)
initiate start time
0 no 2.8225
0 no 2.82375
1 yes 2.82500
1 no 2.82625
1 no 2.82700
1 no 2.82725
0 no 16.8075
0 no 16.8100
0 no 16.8200
1 yes 16.8212
1 no 16.8250
0 no 16.8262
1 no 16.8275
1 no 16.8300

我想要的输出是:

initiate start time diff
0 no 2.8225 -0.00250
0 no 2.82375 -0.00125
1 yes 2.82500 0
1 no 2.82625 0.00125
1 no 2.82700 0.00200
1 no 2.82725 0.00225
0 no 16.8075 -0.0137
0 no 16.8100 -0.0112
0 no 16.8200 -0.0012
1 yes 16.8212 0
1 no 16.8250 0.00380
0 no 16.8262 0.00500
1 no 16.8275 0.00630
1 no 16.8300 0.00880

我试过使用滞后、diff 和 shift 以及以下代码。我无法让计算从那些是的行重新开始。这是我能得到的最接近的,但它只从第一个开始计算是的。

df %>%
  group_by(id, grp = cumsum(lag(start, default = '') == 'yes')) %>% 
  mutate(diff = time - time[match('yes', trial_start)]) %>% 
  {. ->> df}

使用 fuzzyjoin 在这里可能会有用:

library(dplyr)
library(fuzzyjoin)

df_grp <- df %>% 
  filter(start == "yes") %>% 
  select(time) %>% 
  group_by(grp = row_number()) %>% 
  mutate(begin = time - 5,
         end = time + 5)

首先,我们使用 -5+5 值创建一个 data.frame 初始值:

# A tibble: 2 x 4
   time   grp begin   end
  <dbl> <int> <dbl> <dbl>
1  2.82     1 -2.17  7.82
2 16.8      2 11.8  21.8 

接下来我们用一个fuzzy_join附加到原来的data.frame上,计算差值:

df %>% 
  fuzzy_left_join(df_grp, 
                  by = c("time" = "begin", "time" = "end"),
                  match_fun = list(`>`, `<`)) %>% 
  group_by(grp) %>% 
  mutate(diff = time.x - time.y) %>% 
  ungroup()

这个returns

# A tibble: 14 x 8
   initiate start time.x time.y   grp begin   end     diff
      <int> <chr>  <dbl>  <dbl> <int> <dbl> <dbl>    <dbl>
 1        0 no      2.82   2.82     1 -2.17  7.82 -0.00250
 2        0 no      2.82   2.82     1 -2.17  7.82 -0.00125
 3        1 yes     2.82   2.82     1 -2.17  7.82  0      
 4        1 no      2.83   2.82     1 -2.17  7.82  0.00125
 5        1 no      2.83   2.82     1 -2.17  7.82  0.00200
 6        1 no      2.83   2.82     1 -2.17  7.82  0.00225
 7        0 no     16.8   16.8      2 11.8  21.8  -0.0137 
 8        0 no     16.8   16.8      2 11.8  21.8  -0.0112 
 9        0 no     16.8   16.8      2 11.8  21.8  -0.00120
10        1 yes    16.8   16.8      2 11.8  21.8   0      
11        1 no     16.8   16.8      2 11.8  21.8   0.00380
12        0 no     16.8   16.8      2 11.8  21.8   0.00500
13        1 no     16.8   16.8      2 11.8  21.8   0.00630
14        1 no     16.8   16.8      2 11.8  21.8   0.00880