如何在 R 中每行按分钟将日期时间对象或间隔对象分解为分钟

How to break down datetime object or interval object into minute by minute per row in R

我有一个包含日期时间列(开始)和 datetime_end 的数据集。数据处理后,我想每行按分钟细分这个间隔 - 假设我有这个间隔

datetime                datetime_end          id   disc
2019-03-19 12:47:28     2019-03-19 12:50:37   5-3 start

我想按分钟将其分解为这样的内容:

    datetime                  id   disc
2019-03-19 12:48:00           5-3 start
2019-03-19 12:49:00           5-3 start
2019-03-19 12:50:00           5-3 start
2019-03-19 12:51:00           5-3 start

这是虚拟数据框

df1 <- data.frame(stringsAsFactors=FALSE,
                  datetime = c("2019-03-19T13:26:52Z", "2019-03-19T13:26:19Z",
                               "2019-03-19T13:23:46Z", "2019-03-19T13:22:20Z",
                               "2019-03-19T13:09:56Z", "2019-03-19T13:06:04Z", "2019-03-19T13:05:21Z",
                               "2019-03-19T13:04:37Z", "2019-03-19T12:47:28Z",
                               "2019-03-19T12:46:42Z"),
                  id = c("5-3", "5-3", "5-3", "5-3", "5-3", "5-3", "5-3", "5-3", "5-3",
                         "5-3"),
                  disc = c("car", "stop", "start", "stop", "start", "stop", "start",
                           "stop", "start", "stop")
)

我尝试使用 lubridate::interval 函数来制作间隔对象(行程间隔),但我正在努力按每行的分钟来分解它(如上所示)。所以,如果有人知道解决方案,我将不胜感激。

这是我的脚本

library(tidyverse)
library(lubridate)
  df <- df1 %>% 
    mutate(datetime = lubridate::as_datetime(datetime)) %>% 
    arrange(datetime) %>% 
    mutate(datetime_end = lead(datetime), 
           # Create an interval object.
           Travel_Interval = 
             lubridate::interval(start = datetime, end = datetime_end)) %>% 
    filter(!is.na(Travel_Interval)) %>% 
    # select(-Travel_Interval)
    select(datetime,datetime_end , id , disc,Travel_Interval) %>% 
    filter(disc == "start")

为此我会使用 purrr::map2()

# take df1 %>% mutate datetime column to datetime format %>% sort by datetime
# %>% add datetime_end as lead of datetime %>% filter out records with no
# recorded datetime_end %>% mutate to create column 'minute' by using
# purrr::map2 to iterate over each datetime and datetime_end pair and apply the
# following function {create an sequence of datestamps starting at the "minute
# ceiling" of 'start'datetime' and ending at the "minute ceiling" of
# 'datetime_end in one minute intervals} %>% since the resultant column is a
# list, we have to unnest the data
df <- df1 %>% 
  mutate(datetime = as_datetime(datetime)) %>% 
  arrange(datetime) %>% 
  mutate(datetime_end = lead(datetime, n = 1L)) %>% 
  filter(!is.na(datetime_end)) %>% 
  mutate(minute = purrr::map2(datetime, datetime_end, function(start, stop) {
    seq.POSIXt(from = ceiling_date(start, 'minute'), to = ceiling_date(stop, 'minute'), by = 'min')
  })) %>% 
  unnest()

但是请注意,由于您使用某种形式的舍入(在本例中取上限)有效地 时间戳切割成分钟间隔,您将拥有决定如何处理边界情况。例如:disc == "stop" 的第一个 运行 的最后一行将以 minute == 2019-03-19 12:48:00 结尾,但第一个随后的行 disc == "start" _运行" 的第一行也将以 minute == 2019-03-19 12:48:00 开头:

              datetime  id  disc        datetime_end              minute
1  2019-03-19 12:46:42 5-3  stop 2019-03-19 12:47:28 2019-03-19 12:47:00
2  2019-03-19 12:46:42 5-3  stop 2019-03-19 12:47:28 2019-03-19 12:48:00
3  2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:48:00
4  2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:49:00
5  2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:50:00
6  2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:51:00
7  2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:52:00
8  2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:53:00
9  2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:54:00
10 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:55:00
11 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:56:00
12 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:57:00
13 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:58:00
14 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:59:00
15 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 13:00:00
16 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 13:01:00
17 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 13:02:00
18 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 13:03:00
19 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 13:04:00
20 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 13:05:00
21 2019-03-19 13:04:37 5-3  stop 2019-03-19 13:05:21 2019-03-19 13:05:00
22 2019-03-19 13:04:37 5-3  stop 2019-03-19 13:05:21 2019-03-19 13:06:00
df1 %>% 
  mutate(datetime = lubridate::as_datetime(datetime)) %>% 
  arrange(datetime) %>% 
  mutate(datetime_end = lead(datetime)) %>%
  filter(!is.na(datetime_end)) %>%
  mutate_at(vars(contains("datetime")), ~ round_date(.x + seconds(30), unit = "minute")) %>%
  mutate(diff = time_length(interval(datetime, datetime_end), unit = "minutes")) %>%
  mutate(time = map2(datetime, diff, ~ .x + minutes(seq(0, .y)))) %>%
  unnest(time)

只是想 post 因为我已经在研究它了 - 尽管已经有了很好的答案。这使用 lubridate 函数 time_lengthinterval 来获取序列。