如何根据时间戳列制作时间跨度列?
How to make a timespan column based on a time stamp column?
我有一个数据表,其中第 1 列为时间戳,第 2 列为事件。时间戳的格式为 Y-m-d H:M:OS3
(例如 "2019-09-26 07:29:22,778"
)。
我想添加一个新列,其中包含时间戳 2 与时间戳 1 之间的差异的时间跨度值,然后是 3 与 1 等。例如:
timestamp event diff in sec
2019-09-26 07:29:22,778 X
2019-09-26 07:29:23,918 Y 1.140
2019-09-26 07:29:25,118 Z 2.340
.
.
我们可以使用difftime
library(dplyr)
library(lubridate)
df1 %>%
mutate(timestamp = ymd_hms(timestamp),
diffinsec = cumsum(as.numeric(difftime(timestamp,
lag(timestamp, default = timestamp[1]), unit = 'sec'))))
在base
中:
dt1$timediff <- cumsum(c(0, difftime(tail(dt1$timestamp,-1), head(dt1$timestamp,-1))))
或使用data.table
:
library(data.table)
dt1[ , timediff := cumsum(c(0, diff(as.numeric(timestamp))))][]
#> timestamp event timediff
#> 1: 2019-09-26 07:29:22.778 X 0.00
#> 2: 2019-09-26 07:29:23.917 Y 1.14
#> 3: 2019-09-26 07:29:25.118 Z 2.34
另一个 dplyr
解决方案基于 akrun 的 答案:
library(dplyr)
dt1 %>%
mutate(difftime = difftime(timestamp, timestamp[1], unit = 'sec'))
数据:
N.B: 我正在使用data.table
读取数据。
fread(text="date time event
2019-09-26 07:29:22.778 X
2019-09-26 07:29:23.918 Y
2019-09-26 07:29:25.118 Z") -> dt1
dt1$timestamp <- as.POSIXct(paste(dt1$date, dt1$time), format="%Y-%m-%d %H:%M:%OS")
dt1 <- dt1[,4:3]
这里是 dplyr
的解决方案。我假设您想要与第一个事件的时差。否则 @akrun 的 lag()
答案是正确的。
library(dplyr)
df %>%
mutate(start = min(timestamp)) %>%
mutate(diff = timestamp - start)
#> timestamp event start diff
#> 1 2019-09-26 07:29:22 X 2019-09-26 07:29:22 0.00 secs
#> 2 2019-09-26 07:29:23 Y 2019-09-26 07:29:22 1.14 secs
#> 3 2019-09-26 07:29:25 Z 2019-09-26 07:29:22 2.34 secs
数据
df <- structure(list(timestamp = structure(c(1569479362.778, 1569479363.918,
1569479365.118), class = c("POSIXct", "POSIXt"), tzone = ""),
event = c("X", "Y", "Z")), row.names = c(NA,
-3L), class = "data.frame")
我有一个数据表,其中第 1 列为时间戳,第 2 列为事件。时间戳的格式为 Y-m-d H:M:OS3
(例如 "2019-09-26 07:29:22,778"
)。
我想添加一个新列,其中包含时间戳 2 与时间戳 1 之间的差异的时间跨度值,然后是 3 与 1 等。例如:
timestamp event diff in sec
2019-09-26 07:29:22,778 X
2019-09-26 07:29:23,918 Y 1.140
2019-09-26 07:29:25,118 Z 2.340
.
.
我们可以使用difftime
library(dplyr)
library(lubridate)
df1 %>%
mutate(timestamp = ymd_hms(timestamp),
diffinsec = cumsum(as.numeric(difftime(timestamp,
lag(timestamp, default = timestamp[1]), unit = 'sec'))))
在base
中:
dt1$timediff <- cumsum(c(0, difftime(tail(dt1$timestamp,-1), head(dt1$timestamp,-1))))
或使用data.table
:
library(data.table)
dt1[ , timediff := cumsum(c(0, diff(as.numeric(timestamp))))][]
#> timestamp event timediff
#> 1: 2019-09-26 07:29:22.778 X 0.00
#> 2: 2019-09-26 07:29:23.917 Y 1.14
#> 3: 2019-09-26 07:29:25.118 Z 2.34
另一个 dplyr
解决方案基于 akrun 的 答案:
library(dplyr)
dt1 %>%
mutate(difftime = difftime(timestamp, timestamp[1], unit = 'sec'))
数据:
N.B: 我正在使用data.table
读取数据。
fread(text="date time event
2019-09-26 07:29:22.778 X
2019-09-26 07:29:23.918 Y
2019-09-26 07:29:25.118 Z") -> dt1
dt1$timestamp <- as.POSIXct(paste(dt1$date, dt1$time), format="%Y-%m-%d %H:%M:%OS")
dt1 <- dt1[,4:3]
这里是 dplyr
的解决方案。我假设您想要与第一个事件的时差。否则 @akrun 的 lag()
答案是正确的。
library(dplyr)
df %>%
mutate(start = min(timestamp)) %>%
mutate(diff = timestamp - start)
#> timestamp event start diff
#> 1 2019-09-26 07:29:22 X 2019-09-26 07:29:22 0.00 secs
#> 2 2019-09-26 07:29:23 Y 2019-09-26 07:29:22 1.14 secs
#> 3 2019-09-26 07:29:25 Z 2019-09-26 07:29:22 2.34 secs
数据
df <- structure(list(timestamp = structure(c(1569479362.778, 1569479363.918,
1569479365.118), class = c("POSIXct", "POSIXt"), tzone = ""),
event = c("X", "Y", "Z")), row.names = c(NA,
-3L), class = "data.frame")