如何填写两个日期之间的日期
How to fill dates between two dates
这是我当前的数据框:
df <- data.frame(name = c("A", "A", "A", "B", "B")),
start_date = c("2020-01-23", "2019-10-15", "2019-07-28", "2020-03-15", "2019-04-23")),
end_date = c("2020-05-15", "2020-01-27", "2019-10-17", "2020-07-25", "2020-02-13")),
value = c(8.1, 3.3, 9.1, 9.4, 15.3)))
name start_date end_date value
A 2020-01-23 2020-05-15 8
A 2019-10-15 2020-01-27 3
A 2019-07-28 2019-10-17 9
B 2020-03-15 2020-07-25 9
B 2019-04-23 2020-02-13 15
日期为 POSIXct,不一定连续,可以重叠。
我希望我的输出数据框看起来像这样:
name date value
A 2020-01-23 8.1
A 2020-01-24 8.1
A ... 8.1
A 2020-05-14 8.1
A 2020-05-15 8.1
A 2019-10-15 3.3
A 2019-10-16 3.3
A ... 3.3
A 2020-01-26 3.3
A 2020-01-27 3.3
A 2019-07-28 9.1
A 2019-07-29 9.1
A ... 9.1
A 2019-10-16 9.1
A 2019-10-17 9.1
B 2020-03-15 9.4
B 2020-03-16 9.4
B ... 9.4
B 2020-07-24 9.4
B 2020-07-25 9.4
B 2019-04-23 15.3
B 2019-04-24 15.3
B ... 15.3
B 2020-02-12 15.3
B 2020-02-13 15.3
这是我一直在尝试的:
library(data.table)
setDT(df) [, .(date = seq(as.Date(start_date), as.Date(end_date), by = "day")), by = end_date]
但我一直收到以下错误:
Error in seq.Date(as.Date(start_date), as.Date(end_date), by = "day") :
'from' must be of length 1
我应该怎么做?我愿意使用其他包而不是 data.table 如果它们工作得更好。
这里,我们可能需要使用by
作为行的序列
library(data.table)
setDT(df)[, .(date = seq(as.Date(start_date), as.Date(end_date),
by = 'day')), .(rn = seq_len(nrow(df)), name, value)][, rn := NULL][]
或通过遍历 'start_date'、'end_date' 的相应元素来创建 list
列,以在 Map
和 unnest
中创建日期序列list
library(tidyr)
library(magrittr)
setDT(df)[, .(name, date = Map(seq, MoreArgs = list(by = '1 day'),
as.Date(start_date), as.Date(end_date)), value)] %>%
unnest(date)
# A tibble: 731 x 3
# name date value
# <chr> <date> <dbl>
# 1 A 2020-01-23 8.1
# 2 A 2020-01-24 8.1
# 3 A 2020-01-25 8.1
# 4 A 2020-01-26 8.1
# 5 A 2020-01-27 8.1
# 6 A 2020-01-28 8.1
# 7 A 2020-01-29 8.1
# 8 A 2020-01-30 8.1
# 9 A 2020-01-31 8.1
#10 A 2020-02-01 8.1
# … with 721 more rows
另一种方法使用purrr
df <- data.frame(name = c("A", "A", "A", "B", "B"),
start_date = c("2020-01-23", "2019-10-15", "2019-07-28", "2020-03-15", "2019-04-23"),
end_date = c("2020-05-15", "2020-01-27", "2019-10-17", "2020-07-25", "2020-02-13"),
value = c(8.1, 3.3, 9.1, 9.4, 15.3))
library(dplyr)
library(purrr)
# function take in the name, start, end, value and generate a df fill as wanted
generate_fill <- function(name, start, end, value) {
tibble(name = name,
date = seq(as.Date(start), as.Date(end), by = "1 day"),
value = value)
}
# Map the function to original df and combine the result
bind_rows(
pmap(list(df[["name"]], df[["start_date"]], df[["end_date"]], df[["value"]]),
generate_fill))
输出
# A tibble: 731 x 3
name date value
<chr> <date> <dbl>
1 A 2020-01-23 8.1
2 A 2020-01-24 8.1
3 A 2020-01-25 8.1
4 A 2020-01-26 8.1
5 A 2020-01-27 8.1
6 A 2020-01-28 8.1
7 A 2020-01-29 8.1
8 A 2020-01-30 8.1
9 A 2020-01-31 8.1
10 A 2020-02-01 8.1
# … with 721 more rows
这是我当前的数据框:
df <- data.frame(name = c("A", "A", "A", "B", "B")),
start_date = c("2020-01-23", "2019-10-15", "2019-07-28", "2020-03-15", "2019-04-23")),
end_date = c("2020-05-15", "2020-01-27", "2019-10-17", "2020-07-25", "2020-02-13")),
value = c(8.1, 3.3, 9.1, 9.4, 15.3)))
name start_date end_date value
A 2020-01-23 2020-05-15 8
A 2019-10-15 2020-01-27 3
A 2019-07-28 2019-10-17 9
B 2020-03-15 2020-07-25 9
B 2019-04-23 2020-02-13 15
日期为 POSIXct,不一定连续,可以重叠。
我希望我的输出数据框看起来像这样:
name date value
A 2020-01-23 8.1
A 2020-01-24 8.1
A ... 8.1
A 2020-05-14 8.1
A 2020-05-15 8.1
A 2019-10-15 3.3
A 2019-10-16 3.3
A ... 3.3
A 2020-01-26 3.3
A 2020-01-27 3.3
A 2019-07-28 9.1
A 2019-07-29 9.1
A ... 9.1
A 2019-10-16 9.1
A 2019-10-17 9.1
B 2020-03-15 9.4
B 2020-03-16 9.4
B ... 9.4
B 2020-07-24 9.4
B 2020-07-25 9.4
B 2019-04-23 15.3
B 2019-04-24 15.3
B ... 15.3
B 2020-02-12 15.3
B 2020-02-13 15.3
这是我一直在尝试的:
library(data.table)
setDT(df) [, .(date = seq(as.Date(start_date), as.Date(end_date), by = "day")), by = end_date]
但我一直收到以下错误:
Error in seq.Date(as.Date(start_date), as.Date(end_date), by = "day") :
'from' must be of length 1
我应该怎么做?我愿意使用其他包而不是 data.table 如果它们工作得更好。
这里,我们可能需要使用by
作为行的序列
library(data.table)
setDT(df)[, .(date = seq(as.Date(start_date), as.Date(end_date),
by = 'day')), .(rn = seq_len(nrow(df)), name, value)][, rn := NULL][]
或通过遍历 'start_date'、'end_date' 的相应元素来创建 list
列,以在 Map
和 unnest
中创建日期序列list
library(tidyr)
library(magrittr)
setDT(df)[, .(name, date = Map(seq, MoreArgs = list(by = '1 day'),
as.Date(start_date), as.Date(end_date)), value)] %>%
unnest(date)
# A tibble: 731 x 3
# name date value
# <chr> <date> <dbl>
# 1 A 2020-01-23 8.1
# 2 A 2020-01-24 8.1
# 3 A 2020-01-25 8.1
# 4 A 2020-01-26 8.1
# 5 A 2020-01-27 8.1
# 6 A 2020-01-28 8.1
# 7 A 2020-01-29 8.1
# 8 A 2020-01-30 8.1
# 9 A 2020-01-31 8.1
#10 A 2020-02-01 8.1
# … with 721 more rows
另一种方法使用purrr
df <- data.frame(name = c("A", "A", "A", "B", "B"),
start_date = c("2020-01-23", "2019-10-15", "2019-07-28", "2020-03-15", "2019-04-23"),
end_date = c("2020-05-15", "2020-01-27", "2019-10-17", "2020-07-25", "2020-02-13"),
value = c(8.1, 3.3, 9.1, 9.4, 15.3))
library(dplyr)
library(purrr)
# function take in the name, start, end, value and generate a df fill as wanted
generate_fill <- function(name, start, end, value) {
tibble(name = name,
date = seq(as.Date(start), as.Date(end), by = "1 day"),
value = value)
}
# Map the function to original df and combine the result
bind_rows(
pmap(list(df[["name"]], df[["start_date"]], df[["end_date"]], df[["value"]]),
generate_fill))
输出
# A tibble: 731 x 3
name date value
<chr> <date> <dbl>
1 A 2020-01-23 8.1
2 A 2020-01-24 8.1
3 A 2020-01-25 8.1
4 A 2020-01-26 8.1
5 A 2020-01-27 8.1
6 A 2020-01-28 8.1
7 A 2020-01-29 8.1
8 A 2020-01-30 8.1
9 A 2020-01-31 8.1
10 A 2020-02-01 8.1
# … with 721 more rows