用另一个日期替换日期中的 NA

Replace NA´s in dates with another date

数据:

DB1 <- data.frame(orderItemID  = 1:10,     
orderDate = c("2013-01-21","2013-03-31","2013-04-12","2013-06-01","2014-01-01", "2014-02-19","2014-02-27","2014-10-02","2014-10-31","2014-11-21"),  
deliveryDate = c("2013-01-23", "2013-03-01", "NA", "2013-06-04", "2014-01-03", "NA", "2014-02-28", "2014-10-04", "2014-11-01", "2014-11-23"))

预期结果:

   DB1 <- data.frame(orderItemID  = 1:10,     
 orderDate= c("2013-01-21","2013-03-31","2013-04-12","2013-06-01","2014-01-01", "2014-02-19","2014-02-27","2014-10-02","2014-10-31","2014-11-21"),  
deliveryDate = c("2013-01-23", "2013-03-01", "2013-04-14", "2013-06-04", "2014-01-03", "2014-02-21", "2014-02-28", "2014-10-04", "2014-11-01", "2014-11-23"))

我的问题与我发布的另一个问题类似:所以请不要混淆。 正如您在上面看到的,我在交货日期中有一些缺失值,我想用另一个日期替换它们。该日期应该是特定项目的订单日期 + 以(全)天为单位的平均交货时间。(2 天) 平均交货时间是所有不包含缺失值的样本的平均值计算的时间 = (2days+1day+3days+2days+1day+2days+1day+2days):8=1,75

所以我想用订单日期 +2 天替换 NA in delivery time。如果没有 NA,日期应该保持不变。

我已经试过了(使用 lubridate),但它不起作用 :(

DB1$deliveryDate[is.na(DB1$deliveryDate) ] <- DB1$orderDate + days(2)

有人可以帮助我吗?

首先,将列转换为 Date 个对象:

DB1[,2:3]<-lapply(DB1[,2:3],as.Date)

然后,替换 NA 元素:

DB1$deliveryDate[is.na(DB1$deliveryDate)] <- 
       DB1$orderDate[is.na(DB1$deliveryDate)] +
       mean(difftime(DB1$orderDate,DB1$deliveryDate,units="days"),na.rm=TRUE)
#   orderItemID  orderDate deliveryDate
#1            1 2013-01-21   2013-01-23
#2            2 2013-03-31   2013-03-01
#3            3 2013-04-12   2013-04-14
#4            4 2013-06-01   2013-06-04
#5            5 2014-01-01   2014-01-03
#6            6 2014-02-19   2014-02-21
#7            7 2014-02-27   2014-02-28
#8            8 2014-10-02   2014-10-04
#9            9 2014-10-31   2014-11-01
#10          10 2014-11-21   2014-11-23 

你可以这样做:

DB1 =cbind(DB1$orderItemID,as.data.frame(lapply(DB1[-1], as.character)))

days = round(mean(DB1$deliveryDate-DB1$orderDate, na.rm=T))
mask = is.na(DB1$deliveryDate)

DB1$deliveryDate[mask] = DB1$orderDate[mask]+days

#   DB1$orderItemID  orderDate deliveryDate
#1                1 2013-01-21   2013-01-23
#2                2 2013-03-31   2013-04-01
#3                3 2013-04-12   2013-04-14
#4                4 2013-06-01   2013-06-04
#5                5 2014-01-01   2014-01-03
#6                6 2014-02-19   2014-02-21
#7                7 2014-02-27   2014-02-28
#8                8 2014-10-02   2014-10-04
#9                9 2014-10-31   2014-11-01
#10              10 2014-11-21   2014-11-23

我重新整理了您的数据,因为它们不干净:

DB1 <- data.frame(orderItemID  = 1:10,     
orderDate = c("2013-01-21","2013-03-31","2013-04-12","2013-06-01","2014-01-01", "2014-02-19","2014-02-27","2014-10-02","2014-10-31","2014-11-21"),  
deliveryDate = c("2013-01-23", "2013-04-01", NA, "2013-06-04", "2014-01-03", NA, "2014-02-28", "2014-10-04", "2014-11-01", "2014-11-23"))

假设您是这样输入数据的(请注意,NA 未包含在引号中,因此它们被读取为 NA 而不是 "NA")...

DB1 <- data.frame(orderItemID  = 1:10,     
  orderDate = c("2013-01-21","2013-03-31","2013-04-12","2013-06-01","2014-01-01", "2014-02-19","2014-02-27","2014-10-02","2014-10-31","2014-11-21"),  
  deliveryDate = c("2013-01-23", "2013-03-01", NA, "2013-06-04", "2014-01-03", NA, "2014-02-28", "2014-10-04", "2014-11-01", "2014-11-23"),
  stringsAsFactors = FALSE)

...并且,根据 Nicola 的回答,这样做是为了获得正确的格式...

DB1[,2:3]<-lapply(DB1[,2:3],as.Date)

...这也有效:

library(lubridate)
DB1$deliveryDate <- with(DB1, as.Date(ifelse(is.na(deliveryDate), orderDate + days(2), deliveryDate), origin = "1970-01-01"))

或者您可以使用 dplyr 并通过管道传输它:

library(lubridate)
library(dplyr)
DB2 <- DB1 %>%
  mutate(deliveryDate = ifelse(is.na(deliveryDate), orderDate + days(2), deliveryDate)) %>%
  mutate(deliveryDate = as.Date(.[,"deliveryDate"], origin = "1970-01-01"))