将日期与多年的日期间隔(假期)匹配
Match dates against date intervals (holiday periods) in multiple years
数据:
DB <- data.frame(orderID = c(1,2,3,4,5,6,7,8,9,10),
orderDate = c("1.1.14","8.4.14","17.4.14","29.3.12","29.7.14",
"2.8.14","21.9.14","4.10.14","30.11.14","9.4.06"),
预期结果[希望我算对了天数]:
orderDuringPresentShoppingWeekseasternpast =c("No", "Yes", "Yes", "Yes", "No", "No", "No", "No", "No", "Yes")
大家好,
我想我现在问的最多 complex/difficult 个问题,直到 now:but 也许有人比我聪明,可以在一分钟内解决问题 :)
我有不同的时间跨度,其中包含 public 东部假期的日期。但不仅是今年 - 过去 10 年也是如此。众所周知,东部时间每年都在不同的日期:所以我无法将其固定在每年的特定日期。
1.I 如果订单发生在过去几年的复活节或复活节前 14 天,则 "yes" 发出 "yes",否则发出 "no"。我已经为过去 10 年做了一些时间跨度:
spanEasternpast
[1] 2015-03-22 UTC--2015-04-05 UTC 2014-04-06 UTC--2014-04-20 UTC 2013-03-17 UTC--2013-03-31 UTC 2012-03-25 UTC--2012-04-08 UTC
[5] 2011-04-10 UTC--2011-04-24 UTC 2010-03-21 UTC--2010-04-04 UTC 2009-03-29 UTC--2009-04-12 UTC 2008-03-09 UTC--2008-03-23 UTC
[9] 2007-03-25 UTC--2007-04-08 UTC 2006-04-02 UTC--2006-04-16 UTC 2005-03-13 UTC--2005-03-27 UTC
已经像这样试过了,但还是不行:
Easternpast <- Easter(currentYear:(currentYear -10))
spanEasternpast <- new_interval (ymd(Easternpast-ddays(14)), ymd(Easternpast))
spanEasternpast
[1] 2015-03-22 UTC--2015-04-05 UTC 2014-04-06 UTC--2014-04-20 UTC 2013-03-17 UTC--2013-03-31 UTC 2012-03-25 UTC--2012-04-08 UTC
[5] 2011-04-10 UTC--2011-04-24 UTC 2010-03-21 UTC--2010-04-04 UTC 2009-03-29 UTC--2009-04-12 UTC 2008-03-09 UTC--2008-03-23 UTC
[9] 2007-03-25 UTC--2007-04-08 UTC 2006-04-02 UTC--2006-04-16 UTC 2005-03-13 UTC--2005-03-27 UTC (so this part with the right span is working)
DB$orderDuringPresentShoppingWeekseasternpast <- ifelse(DB$orderDate%within%spanEasternpast == TRUE, "Yes", "No")
希望你能告诉我哪里出了问题或告诉我解决问题的另一种可能性....
干杯,谢谢!
library(lubridate)
library(timeDate) # For function Easter() above
DB$orderDuringPresentShoppingWeekeasternpast <- apply(sapply(dmy(DB$orderDate), function(x) x %within% spanEasternpast), 2, any)
为什么这样做...考虑两个步骤:
sapply(dmy(DB$orderDate), function(x) x %within% spanEasternpast)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [2,] FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [3,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [4,] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
# [5,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [6,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [7,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [8,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [9,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [10,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
# [11,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
然后通过apply(x, margin=2, ...)
按列检查是否有TRUE
这是在 data.table
.
包中使用 foverlaps
的另一种可能性
library(timeDate)
library(data.table)
# first, some dummy data
# create easter interval for 2000-2002
easter <- data.table(start = as.Date(Easter(2000:2002, -14)),
end = as.Date(Easter(2000:2002)))
# start end
# 1: 2000-04-09 2000-04-23
# 2: 2001-04-01 2001-04-15
# 3: 2002-03-17 2002-03-31
# set key for overlap join
setkey(easter)
# create some order dates
# 3 dates before (>14 days) easter holiday
# 3 dates during holiday
set.seed(1)
order <- data.table(order_date = as.Date(Easter(2000:2002)) + sample(c(-17:-15, -2:0)))
# create an 'end date' for the order_date
order[, order_date2 := order_date]
# overlap join
# use nomatch = NA (default) to keep track of dates within and outside holiday period
# convert NA to "No" and non-NA to "Yes" using vector indexing
# remove columns (I deliberately kept start and end just to check the join)
order <- foverlaps(x = order, y = easter, by.x = names(order),
type = "within", mult = "all", nomatch = NA)[
, easter_order := c("Yes", "No")[as.integer(is.na(end)) + 1]][
, order_date2 := NULL]
order
# start end order_date easter_order
# 1: 2000-04-09 2000-04-23 2000-04-23 Yes
# 2: 2001-04-01 2001-04-15 2001-04-13 Yes
# 3: <NA> <NA> 2002-03-16 No
# 4: <NA> <NA> 2000-04-06 No
# 5: 2001-04-01 2001-04-15 2001-04-14 Yes
# 6: <NA> <NA> 2002-03-15 No
请参考@Arun 的this nice answer,其中对foverlaps
的描述更为详尽。
更新来自 OP
的评论
匹配复活节和圣诞节假期的日期
# create intervals for easter and christmas holidays 2000-2002
holiday <- data.table(start = c(as.Date(Easter(2000:2002, -14)),
as.Date(ChristmasDay(year = 2000:2002)) - 14),
end = c(as.Date(Easter(2000:2002)),
as.Date(ChristmasDay(year = 2000:2002))))
# holiday
# set key for overlap join
setkey(holiday)
# create some order dates
# 3 dates before (>14 days) and 3 during easter holiday
# 3 dates before (>14 days) and 3 during christmas holiday
set.seed(1)
order <- data.table(order_date = c(as.Date(Easter(2000:2002)) + sample(c(-17:-15, -2:0)),
as.Date(ChristmasDay(2000:2002)) + sample(c(-17:-15, -2:0))))
# create a 'end' date for the order
order[, order_date2 := order_date]
# overlap join
# use nomatch = NA (default) to keep track of dates within and outside holiday period
# convert NA to "No" and non-NA to "Yes"
# remove columns (I deliberately kept start and end just to check the join)
order <- foverlaps(x = order, y = holiday, by.x = names(order),
type = "within", mult = "all", nomatch = NA)[
, holiday_order := c("Yes", "No")[as.integer(is.na(end)) + 1]][
, order_date2 := NULL]
order
# start end order_date holiday_order
# 1: <NA> <NA> 2000-04-07 No
# 2: 2001-04-01 2001-04-15 2001-04-15 Yes
# 3: <NA> <NA> 2002-03-16 No
# 4: 2000-04-09 2000-04-23 2000-04-21 Yes
# 5: <NA> <NA> 2001-03-29 No
# 6: 2002-03-17 2002-03-31 2002-03-30 Yes
# 7: 2000-12-11 2000-12-25 2000-12-25 Yes
# 8: 2001-12-11 2001-12-25 2001-12-23 Yes
# 9: <NA> <NA> 2002-12-10 No
# 10: <NA> <NA> 2000-12-08 No
# 11: 2001-12-11 2001-12-25 2001-12-24 Yes
# 12: <NA> <NA> 2002-12-09 No
数据:
DB <- data.frame(orderID = c(1,2,3,4,5,6,7,8,9,10),
orderDate = c("1.1.14","8.4.14","17.4.14","29.3.12","29.7.14",
"2.8.14","21.9.14","4.10.14","30.11.14","9.4.06"),
预期结果[希望我算对了天数]:
orderDuringPresentShoppingWeekseasternpast =c("No", "Yes", "Yes", "Yes", "No", "No", "No", "No", "No", "Yes")
大家好,
我想我现在问的最多 complex/difficult 个问题,直到 now:but 也许有人比我聪明,可以在一分钟内解决问题 :)
我有不同的时间跨度,其中包含 public 东部假期的日期。但不仅是今年 - 过去 10 年也是如此。众所周知,东部时间每年都在不同的日期:所以我无法将其固定在每年的特定日期。
1.I 如果订单发生在过去几年的复活节或复活节前 14 天,则 "yes" 发出 "yes",否则发出 "no"。我已经为过去 10 年做了一些时间跨度:
spanEasternpast
[1] 2015-03-22 UTC--2015-04-05 UTC 2014-04-06 UTC--2014-04-20 UTC 2013-03-17 UTC--2013-03-31 UTC 2012-03-25 UTC--2012-04-08 UTC
[5] 2011-04-10 UTC--2011-04-24 UTC 2010-03-21 UTC--2010-04-04 UTC 2009-03-29 UTC--2009-04-12 UTC 2008-03-09 UTC--2008-03-23 UTC
[9] 2007-03-25 UTC--2007-04-08 UTC 2006-04-02 UTC--2006-04-16 UTC 2005-03-13 UTC--2005-03-27 UTC
已经像这样试过了,但还是不行:
Easternpast <- Easter(currentYear:(currentYear -10))
spanEasternpast <- new_interval (ymd(Easternpast-ddays(14)), ymd(Easternpast))
spanEasternpast
[1] 2015-03-22 UTC--2015-04-05 UTC 2014-04-06 UTC--2014-04-20 UTC 2013-03-17 UTC--2013-03-31 UTC 2012-03-25 UTC--2012-04-08 UTC
[5] 2011-04-10 UTC--2011-04-24 UTC 2010-03-21 UTC--2010-04-04 UTC 2009-03-29 UTC--2009-04-12 UTC 2008-03-09 UTC--2008-03-23 UTC
[9] 2007-03-25 UTC--2007-04-08 UTC 2006-04-02 UTC--2006-04-16 UTC 2005-03-13 UTC--2005-03-27 UTC (so this part with the right span is working)
DB$orderDuringPresentShoppingWeekseasternpast <- ifelse(DB$orderDate%within%spanEasternpast == TRUE, "Yes", "No")
希望你能告诉我哪里出了问题或告诉我解决问题的另一种可能性....
干杯,谢谢!
library(lubridate)
library(timeDate) # For function Easter() above
DB$orderDuringPresentShoppingWeekeasternpast <- apply(sapply(dmy(DB$orderDate), function(x) x %within% spanEasternpast), 2, any)
为什么这样做...考虑两个步骤:
sapply(dmy(DB$orderDate), function(x) x %within% spanEasternpast)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [2,] FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [3,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [4,] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
# [5,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [6,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [7,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [8,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [9,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [10,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
# [11,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
然后通过apply(x, margin=2, ...)
TRUE
这是在 data.table
.
foverlaps
的另一种可能性
library(timeDate)
library(data.table)
# first, some dummy data
# create easter interval for 2000-2002
easter <- data.table(start = as.Date(Easter(2000:2002, -14)),
end = as.Date(Easter(2000:2002)))
# start end
# 1: 2000-04-09 2000-04-23
# 2: 2001-04-01 2001-04-15
# 3: 2002-03-17 2002-03-31
# set key for overlap join
setkey(easter)
# create some order dates
# 3 dates before (>14 days) easter holiday
# 3 dates during holiday
set.seed(1)
order <- data.table(order_date = as.Date(Easter(2000:2002)) + sample(c(-17:-15, -2:0)))
# create an 'end date' for the order_date
order[, order_date2 := order_date]
# overlap join
# use nomatch = NA (default) to keep track of dates within and outside holiday period
# convert NA to "No" and non-NA to "Yes" using vector indexing
# remove columns (I deliberately kept start and end just to check the join)
order <- foverlaps(x = order, y = easter, by.x = names(order),
type = "within", mult = "all", nomatch = NA)[
, easter_order := c("Yes", "No")[as.integer(is.na(end)) + 1]][
, order_date2 := NULL]
order
# start end order_date easter_order
# 1: 2000-04-09 2000-04-23 2000-04-23 Yes
# 2: 2001-04-01 2001-04-15 2001-04-13 Yes
# 3: <NA> <NA> 2002-03-16 No
# 4: <NA> <NA> 2000-04-06 No
# 5: 2001-04-01 2001-04-15 2001-04-14 Yes
# 6: <NA> <NA> 2002-03-15 No
请参考@Arun 的this nice answer,其中对foverlaps
的描述更为详尽。
更新来自 OP
的评论
匹配复活节和圣诞节假期的日期
# create intervals for easter and christmas holidays 2000-2002
holiday <- data.table(start = c(as.Date(Easter(2000:2002, -14)),
as.Date(ChristmasDay(year = 2000:2002)) - 14),
end = c(as.Date(Easter(2000:2002)),
as.Date(ChristmasDay(year = 2000:2002))))
# holiday
# set key for overlap join
setkey(holiday)
# create some order dates
# 3 dates before (>14 days) and 3 during easter holiday
# 3 dates before (>14 days) and 3 during christmas holiday
set.seed(1)
order <- data.table(order_date = c(as.Date(Easter(2000:2002)) + sample(c(-17:-15, -2:0)),
as.Date(ChristmasDay(2000:2002)) + sample(c(-17:-15, -2:0))))
# create a 'end' date for the order
order[, order_date2 := order_date]
# overlap join
# use nomatch = NA (default) to keep track of dates within and outside holiday period
# convert NA to "No" and non-NA to "Yes"
# remove columns (I deliberately kept start and end just to check the join)
order <- foverlaps(x = order, y = holiday, by.x = names(order),
type = "within", mult = "all", nomatch = NA)[
, holiday_order := c("Yes", "No")[as.integer(is.na(end)) + 1]][
, order_date2 := NULL]
order
# start end order_date holiday_order
# 1: <NA> <NA> 2000-04-07 No
# 2: 2001-04-01 2001-04-15 2001-04-15 Yes
# 3: <NA> <NA> 2002-03-16 No
# 4: 2000-04-09 2000-04-23 2000-04-21 Yes
# 5: <NA> <NA> 2001-03-29 No
# 6: 2002-03-17 2002-03-31 2002-03-30 Yes
# 7: 2000-12-11 2000-12-25 2000-12-25 Yes
# 8: 2001-12-11 2001-12-25 2001-12-23 Yes
# 9: <NA> <NA> 2002-12-10 No
# 10: <NA> <NA> 2000-12-08 No
# 11: 2001-12-11 2001-12-25 2001-12-24 Yes
# 12: <NA> <NA> 2002-12-09 No