尝试使用 R 确定两个日期范围是否重叠
Trying to determine if two ranges of dates overlap using R
我有一个数据集,其中包含有关学生在一个学年内就读的学校以及他们从每所学校入学和退学的日期的信息。虽然大多数学生只上过一所学校,但也有其他学生上过多达四所不同的学校。我想确保 none 的日期范围重叠。以下是我拥有的数据示例(日期结构为日期):
|---------------------|------------------|---------------------|------------------|
| entry_date_1 | withdrawal_date_1| entry_date_2 | withdrawal_date_2|
|---------------------|------------------|---------------------|------------------|
| 2017-11-09 | 2018-05-24 | NA | NA |
|---------------------|------------------|---------------------|------------------|
| 2017-08-14 | 2017-12-15 | 2017-12-16 | 2018-05-24 |
|---------------------|------------------|---------------------|------------------|
| 2017-08-14 | 2018-06-01 | 2018-01-16 | 2018-03-20 |
|---------------------|------------------|---------------------|------------------|
| 2018-01-24 | 2018-02-25 | 2018-04-03 | 2018-05-24 |
|---------------------|------------------|---------------------|------------------|
我最理想的是能给我这样的逻辑运算符的列:
|---------------------|------------------|---------------------|------------------|------------------|
| entry_date_1 | withdrawal_date_1| entry_date_2 | withdrawal_date_2| overlap? |
|---------------------|------------------|---------------------|------------------|------------------|
| 2017-11-09 | 2018-05-24 | NA | NA | NA |
|---------------------|------------------|---------------------|------------------|------------------|
| 2017-08-14 | 2017-12-15 | 2017-12-16 | 2018-05-24 | FALSE |
|---------------------|------------------|---------------------|------------------|------------------|
| 2017-08-14 | 2018-06-01 | 2018-01-16 | 2018-03-20 | TRUE |
|---------------------|------------------|---------------------|------------------|------------------|
| 2018-01-24 | 2018-02-25 | 2018-04-03 | 2018-05-24 | FALSE |
|---------------------|------------------|---------------------|------------------|------------------|
我尝试使用 DescTools 包中的 %overlaps% 函数来执行此操作,但它不会为任何列生成逻辑运算符 - 只是 NA。如果有人可以帮助我解决问题,那就太好了。任何其他建议也会有所帮助。我对 tidyverse 和 base R 最满意,对 data.table.
不太满意
下面是一个可重现示例的数据片段:
my_data <- data.frame("student_id" = 1:6,
"entry_date_1" = as.Date(c("2017-11-09","2017-08-14","2017-08-14","2018-01-24","2017-10-04","2017-08-14")),
"withdrawal_date_1" = as.Date(c("2018-05-24","2017-12-15","2018-06-01","2018-02-25","2017-11-11","2018-05-24")),
"entry_date_2" = as.Date(c(NA,"2017-12-16","2018-01-16","2018-04-03","2017-12-12",NA)),
"withdrawal_date_2" = as.Date(c(NA,"2018-05-24","2018-03-20","2018-05-24","2018-05-24",NA)))
在此先感谢您的帮助!
您可以在 lubridate
中使用 int_overlaps()
。
library(dplyr)
library(lubridate)
my_data %>%
mutate(overlap = int_overlaps(interval(entry_date_1, withdrawal_date_1),
interval(entry_date_2, withdrawal_date_2)))
# student_id entry_date_1 withdrawal_date_1 entry_date_2 withdrawal_date_2 overlap
# 1 1 2017-11-09 2018-05-24 <NA> <NA> NA
# 2 2 2017-08-14 2017-12-15 2017-12-16 2018-05-24 FALSE
# 3 3 2017-08-14 2018-06-01 2018-01-16 2018-03-20 TRUE
# 4 4 2018-01-24 2018-02-25 2018-04-03 2018-05-24 FALSE
# 5 5 2017-10-04 2017-11-11 2017-12-12 2018-05-24 FALSE
# 6 6 2017-08-14 2018-05-24 <NA> <NA> NA
我有一个数据集,其中包含有关学生在一个学年内就读的学校以及他们从每所学校入学和退学的日期的信息。虽然大多数学生只上过一所学校,但也有其他学生上过多达四所不同的学校。我想确保 none 的日期范围重叠。以下是我拥有的数据示例(日期结构为日期):
|---------------------|------------------|---------------------|------------------|
| entry_date_1 | withdrawal_date_1| entry_date_2 | withdrawal_date_2|
|---------------------|------------------|---------------------|------------------|
| 2017-11-09 | 2018-05-24 | NA | NA |
|---------------------|------------------|---------------------|------------------|
| 2017-08-14 | 2017-12-15 | 2017-12-16 | 2018-05-24 |
|---------------------|------------------|---------------------|------------------|
| 2017-08-14 | 2018-06-01 | 2018-01-16 | 2018-03-20 |
|---------------------|------------------|---------------------|------------------|
| 2018-01-24 | 2018-02-25 | 2018-04-03 | 2018-05-24 |
|---------------------|------------------|---------------------|------------------|
我最理想的是能给我这样的逻辑运算符的列:
|---------------------|------------------|---------------------|------------------|------------------|
| entry_date_1 | withdrawal_date_1| entry_date_2 | withdrawal_date_2| overlap? |
|---------------------|------------------|---------------------|------------------|------------------|
| 2017-11-09 | 2018-05-24 | NA | NA | NA |
|---------------------|------------------|---------------------|------------------|------------------|
| 2017-08-14 | 2017-12-15 | 2017-12-16 | 2018-05-24 | FALSE |
|---------------------|------------------|---------------------|------------------|------------------|
| 2017-08-14 | 2018-06-01 | 2018-01-16 | 2018-03-20 | TRUE |
|---------------------|------------------|---------------------|------------------|------------------|
| 2018-01-24 | 2018-02-25 | 2018-04-03 | 2018-05-24 | FALSE |
|---------------------|------------------|---------------------|------------------|------------------|
我尝试使用 DescTools 包中的 %overlaps% 函数来执行此操作,但它不会为任何列生成逻辑运算符 - 只是 NA。如果有人可以帮助我解决问题,那就太好了。任何其他建议也会有所帮助。我对 tidyverse 和 base R 最满意,对 data.table.
不太满意下面是一个可重现示例的数据片段:
my_data <- data.frame("student_id" = 1:6,
"entry_date_1" = as.Date(c("2017-11-09","2017-08-14","2017-08-14","2018-01-24","2017-10-04","2017-08-14")),
"withdrawal_date_1" = as.Date(c("2018-05-24","2017-12-15","2018-06-01","2018-02-25","2017-11-11","2018-05-24")),
"entry_date_2" = as.Date(c(NA,"2017-12-16","2018-01-16","2018-04-03","2017-12-12",NA)),
"withdrawal_date_2" = as.Date(c(NA,"2018-05-24","2018-03-20","2018-05-24","2018-05-24",NA)))
在此先感谢您的帮助!
您可以在 lubridate
中使用 int_overlaps()
。
library(dplyr)
library(lubridate)
my_data %>%
mutate(overlap = int_overlaps(interval(entry_date_1, withdrawal_date_1),
interval(entry_date_2, withdrawal_date_2)))
# student_id entry_date_1 withdrawal_date_1 entry_date_2 withdrawal_date_2 overlap
# 1 1 2017-11-09 2018-05-24 <NA> <NA> NA
# 2 2 2017-08-14 2017-12-15 2017-12-16 2018-05-24 FALSE
# 3 3 2017-08-14 2018-06-01 2018-01-16 2018-03-20 TRUE
# 4 4 2018-01-24 2018-02-25 2018-04-03 2018-05-24 FALSE
# 5 5 2017-10-04 2017-11-11 2017-12-12 2018-05-24 FALSE
# 6 6 2017-08-14 2018-05-24 <NA> <NA> NA