将 ymd_hms 时间映射为 15 分钟的时间间隔
Mapping ymd_hms time into 15 minute time intervals
我正在寻找以下问题的 data.table 解决方案。
假设我有以下数据集:
library(data.table)
library(lubridate)
DT <- data.table(date = c("2019-01-01 00:03:04",
"2019-01-01 00:07:03",
"2019-01-01 00:15:23",
"2019-01-01 00:16:28",
"2019-01-01 00:21:30"))
DT
date
1: 2019-01-01 00:03:04
2: 2019-01-01 00:07:03
3: 2019-01-01 00:15:23
4: 2019-01-01 00:16:28
5: 2019-01-01 00:21:30
我希望将每个日期映射到每天 15 分钟的时间间隔。所以最终的数据集看起来像:
date date_15min_grp
1: 2019-01-01 00:03:04 1
2: 2019-01-01 00:07:03 1
3: 2019-01-01 00:15:23 2
4: 2019-01-01 00:16:28 2
5: 2019-01-01 00:21:30 2
这些组在天级别是唯一的,因此 2019-01-01 00:03:04
和 2019-01-02 00:03:04
的 return 值都是 1。我目前的解决方案是提取 hour/minute/second每一天的组成部分。然后我会添加值(转换为分钟后),将它们除以 60,并尝试将每个值映射到最近的“组”。但这似乎很慢而且不优雅。
一个数据。table/lubridate将不胜感激。
非常感谢!
请在下面找到使用 data.table
和 lubridate
的解决方案
Reprex
- 代码
library(data.table)
library(lubridate)
DT[, date_15min_grp := fcase(minute(date) < 15, 1,
minute(date) < 30, 2,
minute(date) < 45, 3,
default = 4)][]
- 输出
#> date date_15min_grp
#> 1: 2019-01-01 00:03:04 1
#> 2: 2019-01-01 00:07:03 1
#> 3: 2019-01-01 00:15:23 2
#> 4: 2019-01-01 00:16:28 2
#> 5: 2019-01-01 00:21:30 2
由 reprex package (v2.0.1)
于 2021-11-30 创建
作为您评论的后续行动
- 代码
library(data.table)
library(lubridate)
DT[, date_15min_grp := fcase(minute(date) < 15, hour(date)*4 + 1,
minute(date) < 30, hour(date)*4 + 2,
minute(date) < 45, hour(date)*4 + 3,
minute(date) < 60, hour(date)*4 + 4)][]
- 输出
#> date date_15min_grp
#> 1: 2019-01-01 00:03:04 1
#> 2: 2019-01-01 00:07:03 1
#> 3: 2019-01-01 00:15:23 2
#> 4: 2019-01-01 00:16:28 2
#> 5: 2019-01-01 00:21:30 2
由 reprex package (v2.0.1)
于 2021-12-01 创建
您应该看看滑块包是否适合您。它既可以按行切割数据,也可以应用类似 apply / purrr 功能的功能。
library(tibble)
library(slider)
library(dplyr)
f <- data.frame(date = c(as.POSIXlt("2019-01-01 00:03:04"),
as.POSIXlt("2019-01-01 00:07:03"), as.POSIXlt("2019-01-01 00:15:23"),
as.POSIXlt("2019-01-01 00:16:28"), as.POSIXlt("2019-01-01 00:21:30"),
as.POSIXlt("2019-01-01 00:22:03"), as.POSIXlt("2019-01-01 00:25:23"),
as.POSIXlt("2019-01-01 00:36:28"), as.POSIXlt("2019-01-01 00:41:30"),
as.POSIXlt("2019-01-01 00:47:03"), as.POSIXlt("2019-01-01 00:48:23"),
as.POSIXlt("2019-01-01 00:51:28"), as.POSIXlt("2019-01-01 00:51:30"),
as.POSIXlt("2019-01-01 00:57:03"), as.POSIXlt("2019-01-01 00:61:23"),
as.POSIXlt("2019-01-01 00:66:28"))) %>% arrange(date)
g <- block(f, f$date, period = "minute", every=15)
两件事:
我认为您的第 5 行应该是第 2 组,因为它在当天的 00:30:00
之前。如果这不正确,请澄清。
你说要按天计算,但你的数据只包含一天;我将扩充它以演示其每天的计算。
DT[,date := as.POSIXct(date)]
DT2 <- rbindlist(list(DT, DT[, date := date + 86400]))
DT2
# date grp
# <POSc> <int>
# 1: 2019-01-02 00:03:04 1
# 2: 2019-01-02 00:07:03 1
# 3: 2019-01-02 00:15:23 2
# 4: 2019-01-02 00:16:28 2
# 5: 2019-01-02 00:21:30 2
# 6: 2019-01-02 00:03:04 1
# 7: 2019-01-02 00:07:03 1
# 8: 2019-01-02 00:15:23 2
# 9: 2019-01-02 00:16:28 2
# 10: 2019-01-02 00:21:30 2
以及小组作业:
DT2[, day := format(date, format = "%Y%m%d")
][, grp := findInterval(date, seq(lubridate::floor_date(min(date), unit = "hours"), max(date) + 3600, by = "15 mins")), by = day][]
# date grp day
# <POSc> <int> <char>
# 1: 2019-01-02 00:03:04 1 20190102
# 2: 2019-01-02 00:07:03 1 20190102
# 3: 2019-01-02 00:15:23 2 20190102
# 4: 2019-01-02 00:16:28 2 20190102
# 5: 2019-01-02 00:21:30 2 20190102
# 6: 2019-01-02 00:03:04 1 20190102
# 7: 2019-01-02 00:07:03 1 20190102
# 8: 2019-01-02 00:15:23 2 20190102
# 9: 2019-01-02 00:16:28 2 20190102
# 10: 2019-01-02 00:21:30 2 20190102
library(data.table)
library(lubridate)
DT <- data.table(date = c("2019-01-01 00:03:04",
"2019-01-01 00:07:03",
"2019-01-01 00:15:23",
"2019-01-01 00:16:28",
"2019-01-01 00:21:30"))
# if every new day falls in a new group
DT[, date_15min_grp := .GRP, by = floor_date(ymd_hms(date), "15 minutes")]
# if every new day needs to fall in the same group
DT[, date_15min_grp := .GRP, by = format(floor_date(ymd_hms(date), "15 minutes"), "%H:%M:%S")]
我的 santoku 包裹的单线:
DT$date <- as.POSIXct(DT$date)
DT$interval <- santoku::chop_width(DT$date, minutes(15))
DT
## date interval
## 1: 2019-01-01 00:03:04 [2019-01-01 00:03:04, 2019-01-01 00:18:04)
## 2: 2019-01-01 00:07:03 [2019-01-01 00:03:04, 2019-01-01 00:18:04)
## 3: 2019-01-01 00:15:23 [2019-01-01 00:03:04, 2019-01-01 00:18:04)
## 4: 2019-01-01 00:16:28 [2019-01-01 00:03:04, 2019-01-01 00:18:04)
## 5: 2019-01-01 00:21:30 [2019-01-01 00:18:04, 2019-01-01 00:33:04)
可以自定义标签,例如尝试 chop_width(DT$date, minutes(15), labels = lbl_dash(fmt = "%H:%M:%S"))
我正在寻找以下问题的 data.table 解决方案。
假设我有以下数据集:
library(data.table)
library(lubridate)
DT <- data.table(date = c("2019-01-01 00:03:04",
"2019-01-01 00:07:03",
"2019-01-01 00:15:23",
"2019-01-01 00:16:28",
"2019-01-01 00:21:30"))
DT
date
1: 2019-01-01 00:03:04
2: 2019-01-01 00:07:03
3: 2019-01-01 00:15:23
4: 2019-01-01 00:16:28
5: 2019-01-01 00:21:30
我希望将每个日期映射到每天 15 分钟的时间间隔。所以最终的数据集看起来像:
date date_15min_grp
1: 2019-01-01 00:03:04 1
2: 2019-01-01 00:07:03 1
3: 2019-01-01 00:15:23 2
4: 2019-01-01 00:16:28 2
5: 2019-01-01 00:21:30 2
这些组在天级别是唯一的,因此 2019-01-01 00:03:04
和 2019-01-02 00:03:04
的 return 值都是 1。我目前的解决方案是提取 hour/minute/second每一天的组成部分。然后我会添加值(转换为分钟后),将它们除以 60,并尝试将每个值映射到最近的“组”。但这似乎很慢而且不优雅。
一个数据。table/lubridate将不胜感激。
非常感谢!
请在下面找到使用 data.table
和 lubridate
Reprex
- 代码
library(data.table)
library(lubridate)
DT[, date_15min_grp := fcase(minute(date) < 15, 1,
minute(date) < 30, 2,
minute(date) < 45, 3,
default = 4)][]
- 输出
#> date date_15min_grp
#> 1: 2019-01-01 00:03:04 1
#> 2: 2019-01-01 00:07:03 1
#> 3: 2019-01-01 00:15:23 2
#> 4: 2019-01-01 00:16:28 2
#> 5: 2019-01-01 00:21:30 2
由 reprex package (v2.0.1)
于 2021-11-30 创建作为您评论的后续行动
- 代码
library(data.table)
library(lubridate)
DT[, date_15min_grp := fcase(minute(date) < 15, hour(date)*4 + 1,
minute(date) < 30, hour(date)*4 + 2,
minute(date) < 45, hour(date)*4 + 3,
minute(date) < 60, hour(date)*4 + 4)][]
- 输出
#> date date_15min_grp
#> 1: 2019-01-01 00:03:04 1
#> 2: 2019-01-01 00:07:03 1
#> 3: 2019-01-01 00:15:23 2
#> 4: 2019-01-01 00:16:28 2
#> 5: 2019-01-01 00:21:30 2
由 reprex package (v2.0.1)
于 2021-12-01 创建您应该看看滑块包是否适合您。它既可以按行切割数据,也可以应用类似 apply / purrr 功能的功能。
library(tibble)
library(slider)
library(dplyr)
f <- data.frame(date = c(as.POSIXlt("2019-01-01 00:03:04"),
as.POSIXlt("2019-01-01 00:07:03"), as.POSIXlt("2019-01-01 00:15:23"),
as.POSIXlt("2019-01-01 00:16:28"), as.POSIXlt("2019-01-01 00:21:30"),
as.POSIXlt("2019-01-01 00:22:03"), as.POSIXlt("2019-01-01 00:25:23"),
as.POSIXlt("2019-01-01 00:36:28"), as.POSIXlt("2019-01-01 00:41:30"),
as.POSIXlt("2019-01-01 00:47:03"), as.POSIXlt("2019-01-01 00:48:23"),
as.POSIXlt("2019-01-01 00:51:28"), as.POSIXlt("2019-01-01 00:51:30"),
as.POSIXlt("2019-01-01 00:57:03"), as.POSIXlt("2019-01-01 00:61:23"),
as.POSIXlt("2019-01-01 00:66:28"))) %>% arrange(date)
g <- block(f, f$date, period = "minute", every=15)
两件事:
我认为您的第 5 行应该是第 2 组,因为它在当天的
00:30:00
之前。如果这不正确,请澄清。你说要按天计算,但你的数据只包含一天;我将扩充它以演示其每天的计算。
DT[,date := as.POSIXct(date)]
DT2 <- rbindlist(list(DT, DT[, date := date + 86400]))
DT2
# date grp
# <POSc> <int>
# 1: 2019-01-02 00:03:04 1
# 2: 2019-01-02 00:07:03 1
# 3: 2019-01-02 00:15:23 2
# 4: 2019-01-02 00:16:28 2
# 5: 2019-01-02 00:21:30 2
# 6: 2019-01-02 00:03:04 1
# 7: 2019-01-02 00:07:03 1
# 8: 2019-01-02 00:15:23 2
# 9: 2019-01-02 00:16:28 2
# 10: 2019-01-02 00:21:30 2
以及小组作业:
DT2[, day := format(date, format = "%Y%m%d")
][, grp := findInterval(date, seq(lubridate::floor_date(min(date), unit = "hours"), max(date) + 3600, by = "15 mins")), by = day][]
# date grp day
# <POSc> <int> <char>
# 1: 2019-01-02 00:03:04 1 20190102
# 2: 2019-01-02 00:07:03 1 20190102
# 3: 2019-01-02 00:15:23 2 20190102
# 4: 2019-01-02 00:16:28 2 20190102
# 5: 2019-01-02 00:21:30 2 20190102
# 6: 2019-01-02 00:03:04 1 20190102
# 7: 2019-01-02 00:07:03 1 20190102
# 8: 2019-01-02 00:15:23 2 20190102
# 9: 2019-01-02 00:16:28 2 20190102
# 10: 2019-01-02 00:21:30 2 20190102
library(data.table)
library(lubridate)
DT <- data.table(date = c("2019-01-01 00:03:04",
"2019-01-01 00:07:03",
"2019-01-01 00:15:23",
"2019-01-01 00:16:28",
"2019-01-01 00:21:30"))
# if every new day falls in a new group
DT[, date_15min_grp := .GRP, by = floor_date(ymd_hms(date), "15 minutes")]
# if every new day needs to fall in the same group
DT[, date_15min_grp := .GRP, by = format(floor_date(ymd_hms(date), "15 minutes"), "%H:%M:%S")]
我的 santoku 包裹的单线:
DT$date <- as.POSIXct(DT$date)
DT$interval <- santoku::chop_width(DT$date, minutes(15))
DT
## date interval
## 1: 2019-01-01 00:03:04 [2019-01-01 00:03:04, 2019-01-01 00:18:04)
## 2: 2019-01-01 00:07:03 [2019-01-01 00:03:04, 2019-01-01 00:18:04)
## 3: 2019-01-01 00:15:23 [2019-01-01 00:03:04, 2019-01-01 00:18:04)
## 4: 2019-01-01 00:16:28 [2019-01-01 00:03:04, 2019-01-01 00:18:04)
## 5: 2019-01-01 00:21:30 [2019-01-01 00:18:04, 2019-01-01 00:33:04)
可以自定义标签,例如尝试 chop_width(DT$date, minutes(15), labels = lbl_dash(fmt = "%H:%M:%S"))