将 ymd_hms 时间映射为 15 分钟的时间间隔

Mapping ymd_hms time into 15 minute time intervals

我正在寻找以下问题的 data.table 解决方案。

假设我有以下数据集:

library(data.table)
library(lubridate)

DT <- data.table(date = c("2019-01-01 00:03:04", 
                          "2019-01-01 00:07:03",
                          "2019-01-01 00:15:23",
                          "2019-01-01 00:16:28",
                          "2019-01-01 00:21:30"))
DT
         date
1: 2019-01-01 00:03:04
2: 2019-01-01 00:07:03
3: 2019-01-01 00:15:23
4: 2019-01-01 00:16:28
5: 2019-01-01 00:21:30


我希望将每个日期映射到每天 15 分钟的时间间隔。所以最终的数据集看起来像:

   date                    date_15min_grp
1: 2019-01-01 00:03:04              1
2: 2019-01-01 00:07:03              1
3: 2019-01-01 00:15:23              2
4: 2019-01-01 00:16:28              2
5: 2019-01-01 00:21:30              2

这些组在天级别是唯一的,因此 2019-01-01 00:03:042019-01-02 00:03:04 的 return 值都是 1。我目前的解决方案是提取 hour/minute/second每一天的组成部分。然后我会添加值(转换为分钟后),将它们除以 60,并尝试将每个值映射到最近的“组”。但这似乎很慢而且不优雅。

一个数据。table/lubridate将不胜感激。

非常感谢!

请在下面找到使用 data.tablelubridate

的解决方案

Reprex

  • 代码
library(data.table)
library(lubridate)

DT[, date_15min_grp := fcase(minute(date) < 15, 1,
                             minute(date) < 30, 2,
                             minute(date) < 45, 3,
                             default = 4)][]
  • 输出
#>                   date date_15min_grp
#> 1: 2019-01-01 00:03:04              1
#> 2: 2019-01-01 00:07:03              1
#> 3: 2019-01-01 00:15:23              2
#> 4: 2019-01-01 00:16:28              2
#> 5: 2019-01-01 00:21:30              2

reprex package (v2.0.1)

于 2021-11-30 创建

作为您评论的后续行动

  • 代码
library(data.table)
library(lubridate)

DT[, date_15min_grp := fcase(minute(date) < 15, hour(date)*4 + 1,
                             minute(date) < 30, hour(date)*4 + 2,
                             minute(date) < 45, hour(date)*4 + 3,
                             minute(date) < 60, hour(date)*4 + 4)][]
  • 输出
#>                   date date_15min_grp
#> 1: 2019-01-01 00:03:04              1
#> 2: 2019-01-01 00:07:03              1
#> 3: 2019-01-01 00:15:23              2
#> 4: 2019-01-01 00:16:28              2
#> 5: 2019-01-01 00:21:30              2

reprex package (v2.0.1)

于 2021-12-01 创建

您应该看看滑块包是否适合您。它既可以按行切割数据,也可以应用类似 apply / purrr 功能的功能。

library(tibble)
library(slider)
library(dplyr)

f <- data.frame(date = c(as.POSIXlt("2019-01-01 00:03:04"), 
         as.POSIXlt("2019-01-01 00:07:03"), as.POSIXlt("2019-01-01 00:15:23"),
         as.POSIXlt("2019-01-01 00:16:28"), as.POSIXlt("2019-01-01 00:21:30"), 
         as.POSIXlt("2019-01-01 00:22:03"), as.POSIXlt("2019-01-01 00:25:23"),
         as.POSIXlt("2019-01-01 00:36:28"), as.POSIXlt("2019-01-01 00:41:30"), 
         as.POSIXlt("2019-01-01 00:47:03"), as.POSIXlt("2019-01-01 00:48:23"),
         as.POSIXlt("2019-01-01 00:51:28"), as.POSIXlt("2019-01-01 00:51:30"), 
         as.POSIXlt("2019-01-01 00:57:03"), as.POSIXlt("2019-01-01 00:61:23"),
         as.POSIXlt("2019-01-01 00:66:28"))) %>% arrange(date)

g <- block(f, f$date, period = "minute", every=15)

两件事:

  1. 我认为您的第 5 行应该是第 2 组,因为它在当天的 00:30:00 之前。如果这不正确,请澄清。

  2. 你说要按天计算,但你的数据只包含一天;我将扩充它以演示其每天的计算。

DT[,date := as.POSIXct(date)]
DT2 <- rbindlist(list(DT, DT[, date := date + 86400]))
DT2
#                    date   grp
#                  <POSc> <int>
#  1: 2019-01-02 00:03:04     1
#  2: 2019-01-02 00:07:03     1
#  3: 2019-01-02 00:15:23     2
#  4: 2019-01-02 00:16:28     2
#  5: 2019-01-02 00:21:30     2
#  6: 2019-01-02 00:03:04     1
#  7: 2019-01-02 00:07:03     1
#  8: 2019-01-02 00:15:23     2
#  9: 2019-01-02 00:16:28     2
# 10: 2019-01-02 00:21:30     2

以及小组作业:

DT2[, day := format(date, format = "%Y%m%d")
  ][, grp := findInterval(date, seq(lubridate::floor_date(min(date), unit = "hours"), max(date) + 3600, by = "15 mins")), by = day][]
#                    date   grp      day
#                  <POSc> <int>   <char>
#  1: 2019-01-02 00:03:04     1 20190102
#  2: 2019-01-02 00:07:03     1 20190102
#  3: 2019-01-02 00:15:23     2 20190102
#  4: 2019-01-02 00:16:28     2 20190102
#  5: 2019-01-02 00:21:30     2 20190102
#  6: 2019-01-02 00:03:04     1 20190102
#  7: 2019-01-02 00:07:03     1 20190102
#  8: 2019-01-02 00:15:23     2 20190102
#  9: 2019-01-02 00:16:28     2 20190102
# 10: 2019-01-02 00:21:30     2 20190102
library(data.table)
library(lubridate)

DT <- data.table(date = c("2019-01-01 00:03:04", 
                          "2019-01-01 00:07:03",
                          "2019-01-01 00:15:23",
                          "2019-01-01 00:16:28",
                          "2019-01-01 00:21:30"))

# if every new day falls in a new group
DT[, date_15min_grp := .GRP, by = floor_date(ymd_hms(date), "15 minutes")]

# if every new day needs to fall in the same group
DT[, date_15min_grp := .GRP, by = format(floor_date(ymd_hms(date), "15 minutes"), "%H:%M:%S")]

我的 santoku 包裹的单线:

DT$date <- as.POSIXct(DT$date)
DT$interval <- santoku::chop_width(DT$date, minutes(15))
DT
##                   date                                   interval
## 1: 2019-01-01 00:03:04 [2019-01-01 00:03:04, 2019-01-01 00:18:04)
## 2: 2019-01-01 00:07:03 [2019-01-01 00:03:04, 2019-01-01 00:18:04)
## 3: 2019-01-01 00:15:23 [2019-01-01 00:03:04, 2019-01-01 00:18:04)
## 4: 2019-01-01 00:16:28 [2019-01-01 00:03:04, 2019-01-01 00:18:04)
## 5: 2019-01-01 00:21:30 [2019-01-01 00:18:04, 2019-01-01 00:33:04)

可以自定义标签,例如尝试 chop_width(DT$date, minutes(15), labels = lbl_dash(fmt = "%H:%M:%S"))