使用 lead 和 dplyr 来计算两个时间戳之间的差异
Using lead with dplyr to compute the difference between two time stamps
我想找出两个时间戳之间的差异,方法是根据条件 "Start" 在一列中找到时间戳,然后在中找到满足另一个条件的第一行的时间戳同一列,"Stop"。基本上我们使用一个程序来 "start" 一个行为和 "stop" 一个行为,这样我们就可以计算行为的持续时间。
我已经尝试调整在这个 post 中找到的代码:
但我不知道如何让领导在同一列的后续行中满足条件。可以有 "event" 个具有 "start" 但没有 "stop" 的行为,这使情况变得复杂。示例数据框。
Data
Behavior Modifier_1 Time_relative_s
BodyLength Start 122.11
Growl Start 129.70
Body Length Stop 132.26
Body Length Start 157.79
Body Length Stop 258.85
Body Length Start 270.12
Bark Start 272.26
Growl Start 275.68
Body Length Stop 295.37
我想要这个:
Behavior Modifier_1 Time_relative_s diff
BodyLength Start 122.11 10.15
Growl Start 129.70
Body Length Stop 132.26
Body Length Start 157.79 101.06
Body Length Stop 258.85
Body Length Start 270.12 25.25
Bark Start 272.26
Growl Start 275.68
Body Length Stop 295.37
我试过使用 dplyr 管道:
test<-u%>%
filter(Modifier_1 %in% c("Start","Stop")) %>%
arrange(Time_Relative_s) %>%
mutate(diff = lead(Time_Relative_s, default = first(Time_Relative_s=="Stop")-Time-Relative_s)
但我一定不能正确使用 lead,因为这只是 returns 差异列中对我来说的 Time_Relative_s。有什么建议么?感谢您的帮助!
我们可能需要根据'stop'的出现创建一个分组变量,然后得到第一个'Start'、[=19的位置对应的'Time_relative_s'的差值=] 'Modifier_1'
中的值
library(dplyr)
df1 %>%
group_by(grp = cumsum(lag(Modifier_1 == "Stop", default = FALSE))) %>%
mutate(diff = Time_relative_s[match("Stop", Modifier_1)] -
Time_relative_s[match("Start", Modifier_1)],
diff = replace(diff, row_number() > 1, NA_real_)) %>%
ungroup %>%
select(-grp)
# A tibble: 9 x 4
# Behavior Modifier_1 Time_relative_s diff
# <chr> <chr> <dbl> <dbl>
#1 BodyLength Start 122. 10.1
#2 Growl Start 130. NA
#3 Body Length Stop 132. NA
#4 Body Length Start 158. 101.
#5 Body Length Stop 259. NA
#6 Body Length Start 270. 25.2
#7 Bark Start 272. NA
#8 Growl Start 276. NA
#9 Body Length Stop 295. NA
数据
df1 <- structure(list(Behavior = c("BodyLength", "Growl", "Body Length",
"Body Length", "Body Length", "Body Length", "Bark", "Growl",
"Body Length"), Modifier_1 = c("Start", "Start", "Stop", "Start",
"Stop", "Start", "Start", "Start", "Stop"), Time_relative_s = c(122.11,
129.7, 132.26, 157.79, 258.85, 270.12, 272.26, 275.68, 295.37
)), row.names = c(NA, -9L), class = "data.frame")
我想找出两个时间戳之间的差异,方法是根据条件 "Start" 在一列中找到时间戳,然后在中找到满足另一个条件的第一行的时间戳同一列,"Stop"。基本上我们使用一个程序来 "start" 一个行为和 "stop" 一个行为,这样我们就可以计算行为的持续时间。
我已经尝试调整在这个 post 中找到的代码:
但我不知道如何让领导在同一列的后续行中满足条件。可以有 "event" 个具有 "start" 但没有 "stop" 的行为,这使情况变得复杂。示例数据框。
Data
Behavior Modifier_1 Time_relative_s
BodyLength Start 122.11
Growl Start 129.70
Body Length Stop 132.26
Body Length Start 157.79
Body Length Stop 258.85
Body Length Start 270.12
Bark Start 272.26
Growl Start 275.68
Body Length Stop 295.37
我想要这个:
Behavior Modifier_1 Time_relative_s diff
BodyLength Start 122.11 10.15
Growl Start 129.70
Body Length Stop 132.26
Body Length Start 157.79 101.06
Body Length Stop 258.85
Body Length Start 270.12 25.25
Bark Start 272.26
Growl Start 275.68
Body Length Stop 295.37
我试过使用 dplyr 管道:
test<-u%>%
filter(Modifier_1 %in% c("Start","Stop")) %>%
arrange(Time_Relative_s) %>%
mutate(diff = lead(Time_Relative_s, default = first(Time_Relative_s=="Stop")-Time-Relative_s)
但我一定不能正确使用 lead,因为这只是 returns 差异列中对我来说的 Time_Relative_s。有什么建议么?感谢您的帮助!
我们可能需要根据'stop'的出现创建一个分组变量,然后得到第一个'Start'、[=19的位置对应的'Time_relative_s'的差值=] 'Modifier_1'
中的值library(dplyr)
df1 %>%
group_by(grp = cumsum(lag(Modifier_1 == "Stop", default = FALSE))) %>%
mutate(diff = Time_relative_s[match("Stop", Modifier_1)] -
Time_relative_s[match("Start", Modifier_1)],
diff = replace(diff, row_number() > 1, NA_real_)) %>%
ungroup %>%
select(-grp)
# A tibble: 9 x 4
# Behavior Modifier_1 Time_relative_s diff
# <chr> <chr> <dbl> <dbl>
#1 BodyLength Start 122. 10.1
#2 Growl Start 130. NA
#3 Body Length Stop 132. NA
#4 Body Length Start 158. 101.
#5 Body Length Stop 259. NA
#6 Body Length Start 270. 25.2
#7 Bark Start 272. NA
#8 Growl Start 276. NA
#9 Body Length Stop 295. NA
数据
df1 <- structure(list(Behavior = c("BodyLength", "Growl", "Body Length",
"Body Length", "Body Length", "Body Length", "Bark", "Growl",
"Body Length"), Modifier_1 = c("Start", "Start", "Stop", "Start",
"Stop", "Start", "Start", "Start", "Stop"), Time_relative_s = c(122.11,
129.7, 132.26, 157.79, 258.85, 270.12, 272.26, 275.68, 295.37
)), row.names = c(NA, -9L), class = "data.frame")