在 R 中分解跨多个日期的观察
Breaking up observations across multiple dates in R
我希望找到 data.table
+ lubridate
解决以下问题的方法。
假设我有以下数据集:
library(data.table)
library(lubridate)
library(magrittr)
sample <- data.table(start = c("2018-12-22 23:00:00",
"2018-12-23 06:00:00",
"2018-12-22 06:00:00",
"2018-12-23 06:00:00"),
end = c("2018-12-23 06:00:00",
"2018-12-23 13:00:00",
"2018-12-23 12:00:00",
"2018-12-24 01:00:00"),
store = c("A", "A", "B", "B"),
var = 1:4)
sample[, start:= ymd_hms(start)]
sample[, end := ymd_hms(end)]
输出如下:
> sample
start end store var
1: 2018-12-22 23:00:00 2018-12-23 06:00:00 A 1
2: 2018-12-23 06:00:00 2018-12-23 13:00:00 A 2
3: 2018-12-22 06:00:00 2018-12-23 12:00:00 B 3
4: 2018-12-23 06:00:00 2018-12-24 01:00:00 B 4
请注意,在第 1、3 和 4 行,我们有跨越多个日期的观察结果。我想分解这些观察结果,使它们每个只占用一个日历日期。开始日期和结束日期也可能跨越多天,但我想对每个日期进行一次观察。对于上面的示例,data.table
应如下所示:
start end store var
1: 2018-12-22 23:00:00 2018-12-22 23:59:59 A 1
2: 2018-12-23 00:00:00 2018-12-23 06:00:00 A 1
3: 2018-12-23 06:00:00 2018-12-23 13:00:00 A 2
4: 2018-12-22 06:00:00 2018-12-22 23:59:59 B 3
5: 2018-12-23 00:00:00 2018-12-23 12:00:00 B 3
6: 2018-12-23 06:00:00 2018-12-23 23:59:59 B 4
7: 2018-12-24 00:00:00 2018-12-24 01:00:00 B 4
重要的是,当我们分解多个日期的观察时,var
变量是相同的。
谢谢!
# expand the rows
sample = sample[sample[, .(date = seq(as.IDate(start),as.IDate(end),1)), by=var], on="var"]
# fix the times
sample[, `:=`(
start = fifelse(as.IDate(start) == date,
ymd_hms(paste0(as.Date(start),as.ITime(start))),
ymd_hms(paste0(date,"00:00:00"))),
end = fifelse(as.IDate(end) == date,
ymd_hms(paste0(as.Date(end),as.ITime(end))),
ymd_hms(paste0(date, "23:59:59"))),
date = NULL
)]
输出:
start end store var
1: 2018-12-22 23:00:00 2018-12-22 23:59:59 A 1
2: 2018-12-23 00:00:00 2018-12-23 06:00:00 A 1
3: 2018-12-23 06:00:00 2018-12-23 13:00:00 A 2
4: 2018-12-22 06:00:00 2018-12-22 23:59:59 B 3
5: 2018-12-23 00:00:00 2018-12-23 12:00:00 B 3
6: 2018-12-23 06:00:00 2018-12-23 23:59:59 B 4
7: 2018-12-24 00:00:00 2018-12-24 01:00:00 B 4
使用简单的 helper-function,
library(lubridate)
func <- function(st, en) {
days <- seq(floor_date(min(st), unit = "days"),
ceiling_date(max(en), unit = "days"),
by = "1 day")
days <- c(st, days[-c(1, length(days))], en)
list(days[-length(days)], days[-1])
}
我们得到:
library(data.table)
sample[, setNames(func(start, end), c("start", "end")), by = .(store, var)]
# store var start end
# <char> <int> <POSc> <POSc>
# 1: A 1 2018-12-22 23:00:00 2018-12-23 00:00:00
# 2: A 1 2018-12-23 00:00:00 2018-12-23 06:00:00
# 3: A 2 2018-12-23 06:00:00 2018-12-23 13:00:00
# 4: B 3 2018-12-22 06:00:00 2018-12-23 00:00:00
# 5: B 3 2018-12-23 00:00:00 2018-12-23 12:00:00
# 6: B 4 2018-12-23 06:00:00 2018-12-24 00:00:00
# 7: B 4 2018-12-24 00:00:00 2018-12-24 01:00:00
我希望找到 data.table
+ lubridate
解决以下问题的方法。
假设我有以下数据集:
library(data.table)
library(lubridate)
library(magrittr)
sample <- data.table(start = c("2018-12-22 23:00:00",
"2018-12-23 06:00:00",
"2018-12-22 06:00:00",
"2018-12-23 06:00:00"),
end = c("2018-12-23 06:00:00",
"2018-12-23 13:00:00",
"2018-12-23 12:00:00",
"2018-12-24 01:00:00"),
store = c("A", "A", "B", "B"),
var = 1:4)
sample[, start:= ymd_hms(start)]
sample[, end := ymd_hms(end)]
输出如下:
> sample
start end store var
1: 2018-12-22 23:00:00 2018-12-23 06:00:00 A 1
2: 2018-12-23 06:00:00 2018-12-23 13:00:00 A 2
3: 2018-12-22 06:00:00 2018-12-23 12:00:00 B 3
4: 2018-12-23 06:00:00 2018-12-24 01:00:00 B 4
请注意,在第 1、3 和 4 行,我们有跨越多个日期的观察结果。我想分解这些观察结果,使它们每个只占用一个日历日期。开始日期和结束日期也可能跨越多天,但我想对每个日期进行一次观察。对于上面的示例,data.table
应如下所示:
start end store var
1: 2018-12-22 23:00:00 2018-12-22 23:59:59 A 1
2: 2018-12-23 00:00:00 2018-12-23 06:00:00 A 1
3: 2018-12-23 06:00:00 2018-12-23 13:00:00 A 2
4: 2018-12-22 06:00:00 2018-12-22 23:59:59 B 3
5: 2018-12-23 00:00:00 2018-12-23 12:00:00 B 3
6: 2018-12-23 06:00:00 2018-12-23 23:59:59 B 4
7: 2018-12-24 00:00:00 2018-12-24 01:00:00 B 4
重要的是,当我们分解多个日期的观察时,var
变量是相同的。
谢谢!
# expand the rows
sample = sample[sample[, .(date = seq(as.IDate(start),as.IDate(end),1)), by=var], on="var"]
# fix the times
sample[, `:=`(
start = fifelse(as.IDate(start) == date,
ymd_hms(paste0(as.Date(start),as.ITime(start))),
ymd_hms(paste0(date,"00:00:00"))),
end = fifelse(as.IDate(end) == date,
ymd_hms(paste0(as.Date(end),as.ITime(end))),
ymd_hms(paste0(date, "23:59:59"))),
date = NULL
)]
输出:
start end store var
1: 2018-12-22 23:00:00 2018-12-22 23:59:59 A 1
2: 2018-12-23 00:00:00 2018-12-23 06:00:00 A 1
3: 2018-12-23 06:00:00 2018-12-23 13:00:00 A 2
4: 2018-12-22 06:00:00 2018-12-22 23:59:59 B 3
5: 2018-12-23 00:00:00 2018-12-23 12:00:00 B 3
6: 2018-12-23 06:00:00 2018-12-23 23:59:59 B 4
7: 2018-12-24 00:00:00 2018-12-24 01:00:00 B 4
使用简单的 helper-function,
library(lubridate)
func <- function(st, en) {
days <- seq(floor_date(min(st), unit = "days"),
ceiling_date(max(en), unit = "days"),
by = "1 day")
days <- c(st, days[-c(1, length(days))], en)
list(days[-length(days)], days[-1])
}
我们得到:
library(data.table)
sample[, setNames(func(start, end), c("start", "end")), by = .(store, var)]
# store var start end
# <char> <int> <POSc> <POSc>
# 1: A 1 2018-12-22 23:00:00 2018-12-23 00:00:00
# 2: A 1 2018-12-23 00:00:00 2018-12-23 06:00:00
# 3: A 2 2018-12-23 06:00:00 2018-12-23 13:00:00
# 4: B 3 2018-12-22 06:00:00 2018-12-23 00:00:00
# 5: B 3 2018-12-23 00:00:00 2018-12-23 12:00:00
# 6: B 4 2018-12-23 06:00:00 2018-12-24 00:00:00
# 7: B 4 2018-12-24 00:00:00 2018-12-24 01:00:00