如何在 R 中每行按分钟将日期时间对象或间隔对象分解为分钟
How to break down datetime object or interval object into minute by minute per row in R
我有一个包含日期时间列(开始)和 datetime_end 的数据集。数据处理后,我想每行按分钟细分这个间隔 - 假设我有这个间隔
datetime datetime_end id disc
2019-03-19 12:47:28 2019-03-19 12:50:37 5-3 start
我想按分钟将其分解为这样的内容:
datetime id disc
2019-03-19 12:48:00 5-3 start
2019-03-19 12:49:00 5-3 start
2019-03-19 12:50:00 5-3 start
2019-03-19 12:51:00 5-3 start
这是虚拟数据框
df1 <- data.frame(stringsAsFactors=FALSE,
datetime = c("2019-03-19T13:26:52Z", "2019-03-19T13:26:19Z",
"2019-03-19T13:23:46Z", "2019-03-19T13:22:20Z",
"2019-03-19T13:09:56Z", "2019-03-19T13:06:04Z", "2019-03-19T13:05:21Z",
"2019-03-19T13:04:37Z", "2019-03-19T12:47:28Z",
"2019-03-19T12:46:42Z"),
id = c("5-3", "5-3", "5-3", "5-3", "5-3", "5-3", "5-3", "5-3", "5-3",
"5-3"),
disc = c("car", "stop", "start", "stop", "start", "stop", "start",
"stop", "start", "stop")
)
我尝试使用 lubridate::interval 函数来制作间隔对象(行程间隔),但我正在努力按每行的分钟来分解它(如上所示)。所以,如果有人知道解决方案,我将不胜感激。
这是我的脚本
library(tidyverse)
library(lubridate)
df <- df1 %>%
mutate(datetime = lubridate::as_datetime(datetime)) %>%
arrange(datetime) %>%
mutate(datetime_end = lead(datetime),
# Create an interval object.
Travel_Interval =
lubridate::interval(start = datetime, end = datetime_end)) %>%
filter(!is.na(Travel_Interval)) %>%
# select(-Travel_Interval)
select(datetime,datetime_end , id , disc,Travel_Interval) %>%
filter(disc == "start")
为此我会使用 purrr::map2()
:
# take df1 %>% mutate datetime column to datetime format %>% sort by datetime
# %>% add datetime_end as lead of datetime %>% filter out records with no
# recorded datetime_end %>% mutate to create column 'minute' by using
# purrr::map2 to iterate over each datetime and datetime_end pair and apply the
# following function {create an sequence of datestamps starting at the "minute
# ceiling" of 'start'datetime' and ending at the "minute ceiling" of
# 'datetime_end in one minute intervals} %>% since the resultant column is a
# list, we have to unnest the data
df <- df1 %>%
mutate(datetime = as_datetime(datetime)) %>%
arrange(datetime) %>%
mutate(datetime_end = lead(datetime, n = 1L)) %>%
filter(!is.na(datetime_end)) %>%
mutate(minute = purrr::map2(datetime, datetime_end, function(start, stop) {
seq.POSIXt(from = ceiling_date(start, 'minute'), to = ceiling_date(stop, 'minute'), by = 'min')
})) %>%
unnest()
但是请注意,由于您使用某种形式的舍入(在本例中取上限)有效地将 时间戳切割成分钟间隔,您将拥有决定如何处理边界情况。例如:disc
== "stop" 的第一个 运行 的最后一行将以 minute
== 2019-03-19 12:48:00 结尾,但第一个随后的行 disc
== "start" _运行" 的第一行也将以 minute
== 2019-03-19 12:48:00 开头:
datetime id disc datetime_end minute
1 2019-03-19 12:46:42 5-3 stop 2019-03-19 12:47:28 2019-03-19 12:47:00
2 2019-03-19 12:46:42 5-3 stop 2019-03-19 12:47:28 2019-03-19 12:48:00
3 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:48:00
4 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:49:00
5 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:50:00
6 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:51:00
7 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:52:00
8 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:53:00
9 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:54:00
10 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:55:00
11 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:56:00
12 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:57:00
13 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:58:00
14 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:59:00
15 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 13:00:00
16 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 13:01:00
17 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 13:02:00
18 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 13:03:00
19 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 13:04:00
20 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 13:05:00
21 2019-03-19 13:04:37 5-3 stop 2019-03-19 13:05:21 2019-03-19 13:05:00
22 2019-03-19 13:04:37 5-3 stop 2019-03-19 13:05:21 2019-03-19 13:06:00
df1 %>%
mutate(datetime = lubridate::as_datetime(datetime)) %>%
arrange(datetime) %>%
mutate(datetime_end = lead(datetime)) %>%
filter(!is.na(datetime_end)) %>%
mutate_at(vars(contains("datetime")), ~ round_date(.x + seconds(30), unit = "minute")) %>%
mutate(diff = time_length(interval(datetime, datetime_end), unit = "minutes")) %>%
mutate(time = map2(datetime, diff, ~ .x + minutes(seq(0, .y)))) %>%
unnest(time)
只是想 post 因为我已经在研究它了 - 尽管已经有了很好的答案。这使用 lubridate
函数 time_length
和 interval
来获取序列。
我有一个包含日期时间列(开始)和 datetime_end 的数据集。数据处理后,我想每行按分钟细分这个间隔 - 假设我有这个间隔
datetime datetime_end id disc
2019-03-19 12:47:28 2019-03-19 12:50:37 5-3 start
我想按分钟将其分解为这样的内容:
datetime id disc
2019-03-19 12:48:00 5-3 start
2019-03-19 12:49:00 5-3 start
2019-03-19 12:50:00 5-3 start
2019-03-19 12:51:00 5-3 start
这是虚拟数据框
df1 <- data.frame(stringsAsFactors=FALSE,
datetime = c("2019-03-19T13:26:52Z", "2019-03-19T13:26:19Z",
"2019-03-19T13:23:46Z", "2019-03-19T13:22:20Z",
"2019-03-19T13:09:56Z", "2019-03-19T13:06:04Z", "2019-03-19T13:05:21Z",
"2019-03-19T13:04:37Z", "2019-03-19T12:47:28Z",
"2019-03-19T12:46:42Z"),
id = c("5-3", "5-3", "5-3", "5-3", "5-3", "5-3", "5-3", "5-3", "5-3",
"5-3"),
disc = c("car", "stop", "start", "stop", "start", "stop", "start",
"stop", "start", "stop")
)
我尝试使用 lubridate::interval 函数来制作间隔对象(行程间隔),但我正在努力按每行的分钟来分解它(如上所示)。所以,如果有人知道解决方案,我将不胜感激。
这是我的脚本
library(tidyverse)
library(lubridate)
df <- df1 %>%
mutate(datetime = lubridate::as_datetime(datetime)) %>%
arrange(datetime) %>%
mutate(datetime_end = lead(datetime),
# Create an interval object.
Travel_Interval =
lubridate::interval(start = datetime, end = datetime_end)) %>%
filter(!is.na(Travel_Interval)) %>%
# select(-Travel_Interval)
select(datetime,datetime_end , id , disc,Travel_Interval) %>%
filter(disc == "start")
为此我会使用 purrr::map2()
:
# take df1 %>% mutate datetime column to datetime format %>% sort by datetime
# %>% add datetime_end as lead of datetime %>% filter out records with no
# recorded datetime_end %>% mutate to create column 'minute' by using
# purrr::map2 to iterate over each datetime and datetime_end pair and apply the
# following function {create an sequence of datestamps starting at the "minute
# ceiling" of 'start'datetime' and ending at the "minute ceiling" of
# 'datetime_end in one minute intervals} %>% since the resultant column is a
# list, we have to unnest the data
df <- df1 %>%
mutate(datetime = as_datetime(datetime)) %>%
arrange(datetime) %>%
mutate(datetime_end = lead(datetime, n = 1L)) %>%
filter(!is.na(datetime_end)) %>%
mutate(minute = purrr::map2(datetime, datetime_end, function(start, stop) {
seq.POSIXt(from = ceiling_date(start, 'minute'), to = ceiling_date(stop, 'minute'), by = 'min')
})) %>%
unnest()
但是请注意,由于您使用某种形式的舍入(在本例中取上限)有效地将 时间戳切割成分钟间隔,您将拥有决定如何处理边界情况。例如:disc
== "stop" 的第一个 运行 的最后一行将以 minute
== 2019-03-19 12:48:00 结尾,但第一个随后的行 disc
== "start" _运行" 的第一行也将以 minute
== 2019-03-19 12:48:00 开头:
datetime id disc datetime_end minute
1 2019-03-19 12:46:42 5-3 stop 2019-03-19 12:47:28 2019-03-19 12:47:00
2 2019-03-19 12:46:42 5-3 stop 2019-03-19 12:47:28 2019-03-19 12:48:00
3 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:48:00
4 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:49:00
5 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:50:00
6 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:51:00
7 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:52:00
8 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:53:00
9 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:54:00
10 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:55:00
11 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:56:00
12 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:57:00
13 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:58:00
14 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:59:00
15 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 13:00:00
16 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 13:01:00
17 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 13:02:00
18 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 13:03:00
19 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 13:04:00
20 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 13:05:00
21 2019-03-19 13:04:37 5-3 stop 2019-03-19 13:05:21 2019-03-19 13:05:00
22 2019-03-19 13:04:37 5-3 stop 2019-03-19 13:05:21 2019-03-19 13:06:00
df1 %>%
mutate(datetime = lubridate::as_datetime(datetime)) %>%
arrange(datetime) %>%
mutate(datetime_end = lead(datetime)) %>%
filter(!is.na(datetime_end)) %>%
mutate_at(vars(contains("datetime")), ~ round_date(.x + seconds(30), unit = "minute")) %>%
mutate(diff = time_length(interval(datetime, datetime_end), unit = "minutes")) %>%
mutate(time = map2(datetime, diff, ~ .x + minutes(seq(0, .y)))) %>%
unnest(time)
只是想 post 因为我已经在研究它了 - 尽管已经有了很好的答案。这使用 lubridate
函数 time_length
和 interval
来获取序列。