基于子集的R数据帧中的差异时间
difftime in R dataframe based on subset
我有这个示例数据框,它跟踪 lamp 打开和关闭的时间。
time lamp status
1 2015-01-01 12:18:17 2 ON
2 2015-01-01 13:07:29 28 ON
3 2015-01-01 13:11:50 28 OFF
4 2015-01-01 13:18:28 2 OFF
5 2015-01-01 14:07:29 28 ON
6 2015-01-01 14:11:35 28 OFF
7 2015-01-01 14:18:28 2 ON
5 2015-01-01 14:18:57 2 OFF
我想要实现的是添加第四列,其中包含 lamp 开启的持续时间(以秒为单位)。
期望的输出:
time lamp status duration
1 2015-01-01 12:18:17 2 ON 3611
2 2015-01-01 13:07:29 28 ON 261
3 2015-01-01 13:11:50 28 OFF NA
4 2015-01-01 13:18:28 2 OFF NA
5 2015-01-01 14:07:29 28 ON 246
6 2015-01-01 14:11:35 28 OFF NA
7 2015-01-01 14:18:28 2 ON 29
5 2015-01-01 14:18:57 2 OFF NA
我已经使用自定义函数成功地做到了这一点,涉及 while 和 for 循环。但...
我是 R 的初学者,我很确定这可以做得更简单和优雅(使用子集、应用、and/or ....)。我就是想不通怎么办?
任何关于正确方向的想法?
这对我有用:
library(dplyr)
df <- df %>% mutate(sec=as.numeric(time)) %>% group_by(lamp) %>% mutate(duration=c(diff(sec), NA)) %>% select(-sec)
df$duration[df$status=="OFF"] <- NA
#### 1 2015-01-01 12:18:17 2 ON 3611
#### 2 2015-01-01 13:07:29 28 ON 261
#### 3 2015-01-01 13:11:50 28 OFF NA
您的数据:
df=structure(list(time = structure(c(1420111097, 1420114049, 1420114310,
1420114708, 1420117649, 1420117895, 1420118308, 1420118337), class = c("POSIXct",
"POSIXt"), tzone = ""), lamp = c(2L, 28L, 28L, 2L, 28L, 28L,
2L, 2L), status = structure(c(2L, 2L, 1L, 1L, 2L, 1L, 2L, 1L), .Label = c("OFF",
"ON"), class = "factor"), duration = c(2952, 261, NA, NA, 246,
NA, 29, NA)), .Names = c("time", "lamp", "status", "duration"
), row.names = c(NA, -8L), class = "data.frame")
我有这个示例数据框,它跟踪 lamp 打开和关闭的时间。
time lamp status
1 2015-01-01 12:18:17 2 ON
2 2015-01-01 13:07:29 28 ON
3 2015-01-01 13:11:50 28 OFF
4 2015-01-01 13:18:28 2 OFF
5 2015-01-01 14:07:29 28 ON
6 2015-01-01 14:11:35 28 OFF
7 2015-01-01 14:18:28 2 ON
5 2015-01-01 14:18:57 2 OFF
我想要实现的是添加第四列,其中包含 lamp 开启的持续时间(以秒为单位)。
期望的输出:
time lamp status duration
1 2015-01-01 12:18:17 2 ON 3611
2 2015-01-01 13:07:29 28 ON 261
3 2015-01-01 13:11:50 28 OFF NA
4 2015-01-01 13:18:28 2 OFF NA
5 2015-01-01 14:07:29 28 ON 246
6 2015-01-01 14:11:35 28 OFF NA
7 2015-01-01 14:18:28 2 ON 29
5 2015-01-01 14:18:57 2 OFF NA
我已经使用自定义函数成功地做到了这一点,涉及 while 和 for 循环。但... 我是 R 的初学者,我很确定这可以做得更简单和优雅(使用子集、应用、and/or ....)。我就是想不通怎么办?
任何关于正确方向的想法?
这对我有用:
library(dplyr)
df <- df %>% mutate(sec=as.numeric(time)) %>% group_by(lamp) %>% mutate(duration=c(diff(sec), NA)) %>% select(-sec)
df$duration[df$status=="OFF"] <- NA
#### 1 2015-01-01 12:18:17 2 ON 3611
#### 2 2015-01-01 13:07:29 28 ON 261
#### 3 2015-01-01 13:11:50 28 OFF NA
您的数据:
df=structure(list(time = structure(c(1420111097, 1420114049, 1420114310,
1420114708, 1420117649, 1420117895, 1420118308, 1420118337), class = c("POSIXct",
"POSIXt"), tzone = ""), lamp = c(2L, 28L, 28L, 2L, 28L, 28L,
2L, 2L), status = structure(c(2L, 2L, 1L, 1L, 2L, 1L, 2L, 1L), .Label = c("OFF",
"ON"), class = "factor"), duration = c(2952, 261, NA, NA, 246,
NA, 29, NA)), .Names = c("time", "lamp", "status", "duration"
), row.names = c(NA, -8L), class = "data.frame")