将 dplyr 的超前和滞后用于向量中的微分值
Using lead and lag from dplyr for differential values within a vector
我有一个数据框
structure(list(Time = structure(c(1531056854, 1531057121, 1517382101,
1517386850, 1517386951, 1517399987, 1517400523, 1517400523), class = c("POSIXct",
"POSIXt")), Data = c("Start", "Exit", "Start", "Start", "Exit",
"Start", "Exit", "Exit"), same = c(0, 0, 1, 0, 0, 0, 1, NA)), class = "data.frame", .Names = c("Time",
"Data", "same"), row.names = c(NA, -8L))
第 2 列的理想情况是 Start
后跟 Exit
。
但是,在某些情况下,我可以使用 Start``Start
和 Exit
或 Start
后跟 Exit``Exit
。我试图通过这段代码来识别后续的启动和退出:
library(dplyr)
df <- df %>% mutate(same = ifelse(Data == lead(Data), 1, 0))
这为我提供了以下输出:
Time Data same
1 2018-07-08 19:04:14 Start 0
2 2018-07-08 19:08:41 Exit 0
3 2018-01-31 12:31:41 Start 1
4 2018-01-31 13:50:50 Start 0
5 2018-01-31 13:52:31 Exit 0
6 2018-01-31 17:29:47 Start 0
7 2018-01-31 17:38:43 Exit 1
8 2018-01-31 17:38:43 Exit NA
我想弄清楚如何识别 第二个 Start
如果序列中有两个 Start
和 first Exit
如果一个序列中有两个Exit
,标记为1。需要的输出如下:
Time Data same
1 2018-07-08 19:04:14 Start 0
2 2018-07-08 19:08:41 Exit 0
3 2018-01-31 12:31:41 Start 0
4 2018-01-31 13:50:50 Start 1 #this should be one
5 2018-01-31 13:52:31 Exit 0
6 2018-01-31 17:29:47 Start 0
7 2018-01-31 17:38:43 Exit 1 #this should be one
8 2018-01-31 17:38:43 Exit 0
我尝试在 ifelse
中使用 if
条件,但它变得一团糟。
library(tidyverse)
df %>%
mutate( same2 = ifelse( Data == "Start" & lag( Data ) == Data, 1, 0 )) %>%
mutate( same2 = ifelse( Data == "Exit" & lead( Data ) == Data, 1, same2 ) )
# Time Data same same2
# 1 2018-07-08 15:34:14 Start 0 NA
# 2 2018-07-08 15:38:41 Exit 0 0
# 3 2018-01-31 08:01:41 Start 1 0
# 4 2018-01-31 09:20:50 Start 0 1
# 5 2018-01-31 09:22:31 Exit 0 0
# 6 2018-01-31 12:59:47 Start 0 0
# 7 2018-01-31 13:08:43 Exit 1 1
# 8 2018-01-31 13:08:43 Exit NA NA
我们可以使用 as.integer
将逻辑强制转换为二进制
df %>%
mutate(same2 = as.integer((Data == 'Start' & lag(Data) == Data)|
(Data == 'Exit' & lead(Data) == Data)))
我有一个数据框
structure(list(Time = structure(c(1531056854, 1531057121, 1517382101,
1517386850, 1517386951, 1517399987, 1517400523, 1517400523), class = c("POSIXct",
"POSIXt")), Data = c("Start", "Exit", "Start", "Start", "Exit",
"Start", "Exit", "Exit"), same = c(0, 0, 1, 0, 0, 0, 1, NA)), class = "data.frame", .Names = c("Time",
"Data", "same"), row.names = c(NA, -8L))
第 2 列的理想情况是 Start
后跟 Exit
。
但是,在某些情况下,我可以使用 Start``Start
和 Exit
或 Start
后跟 Exit``Exit
。我试图通过这段代码来识别后续的启动和退出:
library(dplyr)
df <- df %>% mutate(same = ifelse(Data == lead(Data), 1, 0))
这为我提供了以下输出:
Time Data same
1 2018-07-08 19:04:14 Start 0
2 2018-07-08 19:08:41 Exit 0
3 2018-01-31 12:31:41 Start 1
4 2018-01-31 13:50:50 Start 0
5 2018-01-31 13:52:31 Exit 0
6 2018-01-31 17:29:47 Start 0
7 2018-01-31 17:38:43 Exit 1
8 2018-01-31 17:38:43 Exit NA
我想弄清楚如何识别 第二个 Start
如果序列中有两个 Start
和 first Exit
如果一个序列中有两个Exit
,标记为1。需要的输出如下:
Time Data same
1 2018-07-08 19:04:14 Start 0
2 2018-07-08 19:08:41 Exit 0
3 2018-01-31 12:31:41 Start 0
4 2018-01-31 13:50:50 Start 1 #this should be one
5 2018-01-31 13:52:31 Exit 0
6 2018-01-31 17:29:47 Start 0
7 2018-01-31 17:38:43 Exit 1 #this should be one
8 2018-01-31 17:38:43 Exit 0
我尝试在 ifelse
中使用 if
条件,但它变得一团糟。
library(tidyverse)
df %>%
mutate( same2 = ifelse( Data == "Start" & lag( Data ) == Data, 1, 0 )) %>%
mutate( same2 = ifelse( Data == "Exit" & lead( Data ) == Data, 1, same2 ) )
# Time Data same same2
# 1 2018-07-08 15:34:14 Start 0 NA
# 2 2018-07-08 15:38:41 Exit 0 0
# 3 2018-01-31 08:01:41 Start 1 0
# 4 2018-01-31 09:20:50 Start 0 1
# 5 2018-01-31 09:22:31 Exit 0 0
# 6 2018-01-31 12:59:47 Start 0 0
# 7 2018-01-31 13:08:43 Exit 1 1
# 8 2018-01-31 13:08:43 Exit NA NA
我们可以使用 as.integer
df %>%
mutate(same2 = as.integer((Data == 'Start' & lag(Data) == Data)|
(Data == 'Exit' & lead(Data) == Data)))