检查一系列日期是否在一系列不同的时间间隔内
Checking that a series of dates are within a series of different intervals
这似乎是一件简单的事情,但我很难过。
我使用 tidyverse material 作为指南:here
我有一个经济衰退时期列表,我想创建一个数据框作为输出,列出每个日期以及该日期是否处于经济衰退期。我想将解决方案保留为 dplyr 格式。
这是一个可重现的例子
library(lubridate)
library(tidyverse)
# Sample data set
my_df <-
structure(list(recession_start = structure(c(1400, 3652, 4199,
7486, 11382, 13848), class = "Date"), recession_end = structure(c(1885,
3834, 4687, 7729, 11627, 14396), class = "Date"), recession_interval = new("Interval",
.Data = c(41904000, 15724800, 42163200, 20995200, 21168000,
47347200), start = structure(c(120960000, 315532800, 362793600,
646790400, 983404800, 1196467200), tzone = "UTC", class = c("POSIXct",
"POSIXt")), tzone = "UTC")), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
> my_df
# A tibble: 6 x 3
recession_start recession_end recession_interval
<date> <date> <Interval>
1 1973-11-01 1975-03-01 1973-11-01 UTC--1975-03-01 UTC
2 1980-01-01 1980-07-01 1980-01-01 UTC--1980-07-01 UTC
3 1981-07-01 1982-11-01 1981-07-01 UTC--1982-11-01 UTC
4 1990-07-01 1991-03-01 1990-07-01 UTC--1991-03-01 UTC
5 2001-03-01 2001-11-01 2001-03-01 UTC--2001-11-01 UTC
6 2007-12-01 2009-06-01 2007-12-01 UTC--2009-06-01 UTC
# Get every day in the range of dates
my_dates <- seq(first(my_df$recession_start), today(), by = "day")
# Create a list of intervals
recession_intervals <- list(my_df$recession_interval)
# Check to see if `my_dates` are in the intervals
recession <- my_dates %within% recession_intervals # Throws warning and does not give expected results
我怀疑这是因为我的日期列表是单个列表,而不是 tidyverse 示例中的多个列表,但我不确定如何创建多个列表,而不是手动创建。
所需的输出将是一个数据框,其中包含每个日期和一个“真”或“假”列,指示该日期是否处于衰退区间。类似于:
recession_df <- data.frame(Date = my_dates, recession = recession)
输出将如下所示:
Date recession
1 1973-11-01 TRUE
2 1973-11-02 TRUE
3 1973-11-03 TRUE
4 1973-11-04 TRUE
5 1973-11-05 TRUE
6 1973-11-06 TRUE
7 1973-11-07 TRUE
8 1973-11-08 TRUE
9 1973-11-09 TRUE
10 1973-11-10 TRUE
感谢您的帮助!
一个选项是遍历 (map
) 'my_dates',检查是否有 any
日期是 %within%
'recession_interval 列,使用每个 'date' 和逻辑输出创建一个 tibble
,并使用 _dfr
(行绑定)
转换为单个数据集
library(purrr)
out <- map_dfr(my_dates, ~ tibble(Date = .x,
recession = any(Date %within% my_df$recession_interval)))
-输出
# A tibble: 17,381 x 2
Date recession
<date> <lgl>
1 1973-11-01 TRUE
2 1973-11-02 TRUE
3 1973-11-03 TRUE
4 1973-11-04 TRUE
5 1973-11-05 TRUE
6 1973-11-06 TRUE
7 1973-11-07 TRUE
8 1973-11-08 TRUE
9 1973-11-09 TRUE
10 1973-11-10 TRUE
# … with 17,371 more rows
这对我有用:
in_recession <-
tibble(date = my_dates) %>%
mutate(
recession = date %>%
map_lgl(~any(.x %within% my_df$recession_interval))
)
我们也可以使用以下解决方案,而不使用 recession_interval
列:
library(purrr)
my_dates %>%
as_tibble() %>%
rowwise() %>%
mutate(fall = any(map2_lgl(my_df$recession_start, my_df$recession_end,
~ between(value, .x, .y))))
# A tibble: 17,382 x 2
# Rowwise:
value fall
<date> <lgl>
1 1973-11-01 TRUE
2 1973-11-02 TRUE
3 1973-11-03 TRUE
4 1973-11-04 TRUE
5 1973-11-05 TRUE
6 1973-11-06 TRUE
7 1973-11-07 TRUE
8 1973-11-08 TRUE
9 1973-11-09 TRUE
10 1973-11-10 TRUE
# ... with 17,372 more rows
这似乎是一件简单的事情,但我很难过。
我使用 tidyverse material 作为指南:here
我有一个经济衰退时期列表,我想创建一个数据框作为输出,列出每个日期以及该日期是否处于经济衰退期。我想将解决方案保留为 dplyr 格式。
这是一个可重现的例子
library(lubridate)
library(tidyverse)
# Sample data set
my_df <-
structure(list(recession_start = structure(c(1400, 3652, 4199,
7486, 11382, 13848), class = "Date"), recession_end = structure(c(1885,
3834, 4687, 7729, 11627, 14396), class = "Date"), recession_interval = new("Interval",
.Data = c(41904000, 15724800, 42163200, 20995200, 21168000,
47347200), start = structure(c(120960000, 315532800, 362793600,
646790400, 983404800, 1196467200), tzone = "UTC", class = c("POSIXct",
"POSIXt")), tzone = "UTC")), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
> my_df
# A tibble: 6 x 3
recession_start recession_end recession_interval
<date> <date> <Interval>
1 1973-11-01 1975-03-01 1973-11-01 UTC--1975-03-01 UTC
2 1980-01-01 1980-07-01 1980-01-01 UTC--1980-07-01 UTC
3 1981-07-01 1982-11-01 1981-07-01 UTC--1982-11-01 UTC
4 1990-07-01 1991-03-01 1990-07-01 UTC--1991-03-01 UTC
5 2001-03-01 2001-11-01 2001-03-01 UTC--2001-11-01 UTC
6 2007-12-01 2009-06-01 2007-12-01 UTC--2009-06-01 UTC
# Get every day in the range of dates
my_dates <- seq(first(my_df$recession_start), today(), by = "day")
# Create a list of intervals
recession_intervals <- list(my_df$recession_interval)
# Check to see if `my_dates` are in the intervals
recession <- my_dates %within% recession_intervals # Throws warning and does not give expected results
我怀疑这是因为我的日期列表是单个列表,而不是 tidyverse 示例中的多个列表,但我不确定如何创建多个列表,而不是手动创建。
所需的输出将是一个数据框,其中包含每个日期和一个“真”或“假”列,指示该日期是否处于衰退区间。类似于:
recession_df <- data.frame(Date = my_dates, recession = recession)
输出将如下所示:
Date recession
1 1973-11-01 TRUE
2 1973-11-02 TRUE
3 1973-11-03 TRUE
4 1973-11-04 TRUE
5 1973-11-05 TRUE
6 1973-11-06 TRUE
7 1973-11-07 TRUE
8 1973-11-08 TRUE
9 1973-11-09 TRUE
10 1973-11-10 TRUE
感谢您的帮助!
一个选项是遍历 (map
) 'my_dates',检查是否有 any
日期是 %within%
'recession_interval 列,使用每个 'date' 和逻辑输出创建一个 tibble
,并使用 _dfr
(行绑定)
library(purrr)
out <- map_dfr(my_dates, ~ tibble(Date = .x,
recession = any(Date %within% my_df$recession_interval)))
-输出
# A tibble: 17,381 x 2
Date recession
<date> <lgl>
1 1973-11-01 TRUE
2 1973-11-02 TRUE
3 1973-11-03 TRUE
4 1973-11-04 TRUE
5 1973-11-05 TRUE
6 1973-11-06 TRUE
7 1973-11-07 TRUE
8 1973-11-08 TRUE
9 1973-11-09 TRUE
10 1973-11-10 TRUE
# … with 17,371 more rows
这对我有用:
in_recession <-
tibble(date = my_dates) %>%
mutate(
recession = date %>%
map_lgl(~any(.x %within% my_df$recession_interval))
)
我们也可以使用以下解决方案,而不使用 recession_interval
列:
library(purrr)
my_dates %>%
as_tibble() %>%
rowwise() %>%
mutate(fall = any(map2_lgl(my_df$recession_start, my_df$recession_end,
~ between(value, .x, .y))))
# A tibble: 17,382 x 2
# Rowwise:
value fall
<date> <lgl>
1 1973-11-01 TRUE
2 1973-11-02 TRUE
3 1973-11-03 TRUE
4 1973-11-04 TRUE
5 1973-11-05 TRUE
6 1973-11-06 TRUE
7 1973-11-07 TRUE
8 1973-11-08 TRUE
9 1973-11-09 TRUE
10 1973-11-10 TRUE
# ... with 17,372 more rows