R bizdays::adjust.previous 在检查日期是否为 NA 时的意外行为

R unexpected behaviour of bizdays::adjust.previous when checking if date is NA

我尝试使用 bizdays 包将数据框中的日期转换为工作日。这个数据框可能有一些缺失值 (NA),所以我添加了一个 ifelse 语句来忽略这些空单元格,但它似乎破坏了代码,我不知道为什么。

这是错误的一个小例子:

library(bizdays)
library(dplyr)

holidays <- c("2022-03-01",
              "2022-03-07",
              "2022-03-08",
              "2022-03-25")

start_date = as.Date("01/01/2010", format = "%d/%m/%Y")
end_date   = as.Date("01/01/2060", format = "%d/%m/%Y")

calendar <- create.calendar("my_cal",
                            holidays =  holidays,
                            weekdays =c("saturday", "sunday"),
                            start.date = start_date,
                            end.date = end_date)

bizdays.options$set(default.calendar="my_cal")


date_1 <- "2022-03-13" # sunday
print(adjust.previous(date_1)) # friday "2022-03-11"

days <- c()
for (i in c(1:31)) {
  days <- c(days, paste("2022-03-", formatC(i, width = 2, flag = '0'), sep = ""))
}

df <- data.frame(days = days)

df_1 <- df %>% mutate(days_1 = adjust.previous(days))

head(df_1) # correct
#        days     days_1
#1 2022-03-01 2022-02-28
#2 2022-03-02 2022-03-02
#3 2022-03-03 2022-03-03
#4 2022-03-04 2022-03-04
#5 2022-03-05 2022-03-04
#6 2022-03-06 2022-03-04

df_2 <- df %>% mutate(days_2 = ifelse(is.na(days),
                                      days,
                                      adjust.previous(days)))

head(df_2) # date is converted to a number
#        days days_2
#1 2022-03-01  19051
#2 2022-03-02  19053
#3 2022-03-03  19054
#4 2022-03-04  19055
#5 2022-03-05  19055
#6 2022-03-06  19055

这与 bizdays 包无关,而是与 class Dateifelse() returns 对象如何作为数字有关。看这个例子:

class(Sys.Date()) # Date
ifelse(TRUE, Sys.Date(), Sys.Date()) # 19066
class(ifelse(TRUE, Sys.Date(), Sys.Date())) # numeric

反之:

if(TRUE) class(Sys.Date()) # Date

在你的情况下,在我看来 ifelse() 是不必要的,因为 adjust.previous 处理 NA 值:

df$days[1] = NA
df_2 <- df %>% mutate(
    days_2 = adjust.previous(days)
)

# Seems to work
head(df_2)
#         days     days_2
# 1       <NA>       <NA>
# 2 2022-03-02 2022-03-02
# 3 2022-03-03 2022-03-03
# 4 2022-03-04 2022-03-04
# 5 2022-03-05 2022-03-04
# 6 2022-03-06 2022-03-04

但是,如果这对您的真实数据不起作用,我会离开 dplyr 世界,这很好但在对列进行子集化时稍微弱一些,并在基础 R 中进行:

df_3  <- df 
df_3$days_3  <- as.Date(0, origin = "1970-01-01") # Create date column
df_3$days_3[is.na(df_3$days)]  <- NA # Fill NA
df_3$days_3[!is.na(df_3$days)]  <- adjust.previous(df_3$days[!is.na(df_3$days)]) # Fill values

# Output as above
head(df_3)
#         days     days_3
# 1       <NA>       <NA>
# 2 2022-03-02 2022-03-02
# 3 2022-03-03 2022-03-03
# 4 2022-03-04 2022-03-04
# 5 2022-03-05 2022-03-04
# 6 2022-03-06 2022-03-04