如何从特定行中查找前一行和后一行之间的时间差
How to find time difference between previous and following rows from specific rows
如果满足条件,我想计算特定行前后行的时间差。我不想找到顺序上的差异(第 3 行 - 第 2 行,第 4 行 - 第 3 行等),但想要与中间行的差异。也许另一种说法是距 0 的距离。
如果开始列显示为“y”,我希望该行的时间为原点,但仅持续 5 秒。我有大约 600,000 行大致连续的时间序列,因此在 start 的两侧计算 5 s 应该有希望做到这一点,这样计算就不会重叠。我什至不确定这在代码中会是什么样子。示例数据,为了方便省略了很多列:
df <- data.frame(
stringsAsFactors = FALSE,
initiate = c(0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 1L),
start = c("no","no","yes","no","no",
"no","no","no","no","yes","no","no","no","no"),
time = c(2.8225,2.82375,2.825,2.82625,
2.827,2.82725,16.8075,16.810,16.82,16.8212,16.825,
16.8262,16.8275,16.8300)
)
initiate
start
time
0
no
2.8225
0
no
2.82375
1
yes
2.82500
1
no
2.82625
1
no
2.82700
1
no
2.82725
0
no
16.8075
0
no
16.8100
0
no
16.8200
1
yes
16.8212
1
no
16.8250
0
no
16.8262
1
no
16.8275
1
no
16.8300
我想要的输出是:
initiate
start
time
diff
0
no
2.8225
-0.00250
0
no
2.82375
-0.00125
1
yes
2.82500
0
1
no
2.82625
0.00125
1
no
2.82700
0.00200
1
no
2.82725
0.00225
0
no
16.8075
-0.0137
0
no
16.8100
-0.0112
0
no
16.8200
-0.0012
1
yes
16.8212
0
1
no
16.8250
0.00380
0
no
16.8262
0.00500
1
no
16.8275
0.00630
1
no
16.8300
0.00880
我试过使用滞后、diff 和 shift 以及以下代码。我无法让计算从那些是的行重新开始。这是我能得到的最接近的,但它只从第一个开始计算是的。
df %>%
group_by(id, grp = cumsum(lag(start, default = '') == 'yes')) %>%
mutate(diff = time - time[match('yes', trial_start)]) %>%
{. ->> df}
使用 fuzzyjoin
在这里可能会有用:
library(dplyr)
library(fuzzyjoin)
df_grp <- df %>%
filter(start == "yes") %>%
select(time) %>%
group_by(grp = row_number()) %>%
mutate(begin = time - 5,
end = time + 5)
首先,我们使用 -5
和 +5
值创建一个 data.frame 初始值:
# A tibble: 2 x 4
time grp begin end
<dbl> <int> <dbl> <dbl>
1 2.82 1 -2.17 7.82
2 16.8 2 11.8 21.8
接下来我们用一个fuzzy_join
附加到原来的data.frame上,计算差值:
df %>%
fuzzy_left_join(df_grp,
by = c("time" = "begin", "time" = "end"),
match_fun = list(`>`, `<`)) %>%
group_by(grp) %>%
mutate(diff = time.x - time.y) %>%
ungroup()
这个returns
# A tibble: 14 x 8
initiate start time.x time.y grp begin end diff
<int> <chr> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 0 no 2.82 2.82 1 -2.17 7.82 -0.00250
2 0 no 2.82 2.82 1 -2.17 7.82 -0.00125
3 1 yes 2.82 2.82 1 -2.17 7.82 0
4 1 no 2.83 2.82 1 -2.17 7.82 0.00125
5 1 no 2.83 2.82 1 -2.17 7.82 0.00200
6 1 no 2.83 2.82 1 -2.17 7.82 0.00225
7 0 no 16.8 16.8 2 11.8 21.8 -0.0137
8 0 no 16.8 16.8 2 11.8 21.8 -0.0112
9 0 no 16.8 16.8 2 11.8 21.8 -0.00120
10 1 yes 16.8 16.8 2 11.8 21.8 0
11 1 no 16.8 16.8 2 11.8 21.8 0.00380
12 0 no 16.8 16.8 2 11.8 21.8 0.00500
13 1 no 16.8 16.8 2 11.8 21.8 0.00630
14 1 no 16.8 16.8 2 11.8 21.8 0.00880
如果满足条件,我想计算特定行前后行的时间差。我不想找到顺序上的差异(第 3 行 - 第 2 行,第 4 行 - 第 3 行等),但想要与中间行的差异。也许另一种说法是距 0 的距离。
如果开始列显示为“y”,我希望该行的时间为原点,但仅持续 5 秒。我有大约 600,000 行大致连续的时间序列,因此在 start 的两侧计算 5 s 应该有希望做到这一点,这样计算就不会重叠。我什至不确定这在代码中会是什么样子。示例数据,为了方便省略了很多列:
df <- data.frame(
stringsAsFactors = FALSE,
initiate = c(0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 1L),
start = c("no","no","yes","no","no",
"no","no","no","no","yes","no","no","no","no"),
time = c(2.8225,2.82375,2.825,2.82625,
2.827,2.82725,16.8075,16.810,16.82,16.8212,16.825,
16.8262,16.8275,16.8300)
)
initiate | start | time |
---|---|---|
0 | no | 2.8225 |
0 | no | 2.82375 |
1 | yes | 2.82500 |
1 | no | 2.82625 |
1 | no | 2.82700 |
1 | no | 2.82725 |
0 | no | 16.8075 |
0 | no | 16.8100 |
0 | no | 16.8200 |
1 | yes | 16.8212 |
1 | no | 16.8250 |
0 | no | 16.8262 |
1 | no | 16.8275 |
1 | no | 16.8300 |
我想要的输出是:
initiate | start | time | diff |
---|---|---|---|
0 | no | 2.8225 | -0.00250 |
0 | no | 2.82375 | -0.00125 |
1 | yes | 2.82500 | 0 |
1 | no | 2.82625 | 0.00125 |
1 | no | 2.82700 | 0.00200 |
1 | no | 2.82725 | 0.00225 |
0 | no | 16.8075 | -0.0137 |
0 | no | 16.8100 | -0.0112 |
0 | no | 16.8200 | -0.0012 |
1 | yes | 16.8212 | 0 |
1 | no | 16.8250 | 0.00380 |
0 | no | 16.8262 | 0.00500 |
1 | no | 16.8275 | 0.00630 |
1 | no | 16.8300 | 0.00880 |
我试过使用滞后、diff 和 shift 以及以下代码。我无法让计算从那些是的行重新开始。这是我能得到的最接近的,但它只从第一个开始计算是的。
df %>%
group_by(id, grp = cumsum(lag(start, default = '') == 'yes')) %>%
mutate(diff = time - time[match('yes', trial_start)]) %>%
{. ->> df}
使用 fuzzyjoin
在这里可能会有用:
library(dplyr)
library(fuzzyjoin)
df_grp <- df %>%
filter(start == "yes") %>%
select(time) %>%
group_by(grp = row_number()) %>%
mutate(begin = time - 5,
end = time + 5)
首先,我们使用 -5
和 +5
值创建一个 data.frame 初始值:
# A tibble: 2 x 4
time grp begin end
<dbl> <int> <dbl> <dbl>
1 2.82 1 -2.17 7.82
2 16.8 2 11.8 21.8
接下来我们用一个fuzzy_join
附加到原来的data.frame上,计算差值:
df %>%
fuzzy_left_join(df_grp,
by = c("time" = "begin", "time" = "end"),
match_fun = list(`>`, `<`)) %>%
group_by(grp) %>%
mutate(diff = time.x - time.y) %>%
ungroup()
这个returns
# A tibble: 14 x 8
initiate start time.x time.y grp begin end diff
<int> <chr> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 0 no 2.82 2.82 1 -2.17 7.82 -0.00250
2 0 no 2.82 2.82 1 -2.17 7.82 -0.00125
3 1 yes 2.82 2.82 1 -2.17 7.82 0
4 1 no 2.83 2.82 1 -2.17 7.82 0.00125
5 1 no 2.83 2.82 1 -2.17 7.82 0.00200
6 1 no 2.83 2.82 1 -2.17 7.82 0.00225
7 0 no 16.8 16.8 2 11.8 21.8 -0.0137
8 0 no 16.8 16.8 2 11.8 21.8 -0.0112
9 0 no 16.8 16.8 2 11.8 21.8 -0.00120
10 1 yes 16.8 16.8 2 11.8 21.8 0
11 1 no 16.8 16.8 2 11.8 21.8 0.00380
12 0 no 16.8 16.8 2 11.8 21.8 0.00500
13 1 no 16.8 16.8 2 11.8 21.8 0.00630
14 1 no 16.8 16.8 2 11.8 21.8 0.00880