R 中的新数据转换问题

A new data conversion issue in R

我有一个包含两种日期格式值的数据列,数字值(例如“38169”)和字符串值(例如“01/03/2004”,格式始终为“%d” /%我的”)。我无法以相同的日期格式转换它们并以标准日期格式“%Y-%m-%d”重新转换它们。

对于下面的例子,我想把变量date_first转换成变量date_clean

附加信息:

数据库是从 Excel 导入的。 以太日期格式是 excel 和 R

中发生字符串更改的结果
data <- data.frame(date_all=c(NA,"38169","37926","01/03/2004 --- 01/03/2004"),
                  date_first=c(NA,"38169","37926","01/03/2004"))
                  
desidered_data <- data.frame(date_all=c(NA,"38169","37926","01/03/2004 --- 01/03/2004"),
                  date_first=c(NA,"38169","37926","01/03/2004"),
                  date_clean=c(NA,2004-07-01,2003-11-01,2004-03-01))

> desidered_data
                   date_all date_first date_clean
1                      <NA>       <NA>       <NA>
2                     38169      38169 2004-07-01
3                     37926      37926 2003-11-01
4 01/03/2004 --- 01/03/2004 01/03/2004 2004-03-01


您可以为此使用 case_when

data %>%
  mutate(date_clean = case_when(grepl("\d{5}", date_first) ~ as.Date(as.numeric(date_first), origin = "1899-12-30"),
                                TRUE ~ as.Date(date_first, format = "%d/%m/%Y")))

基础 R 选项 -

change_mix_date <- function(x) {
  #empty date vector to store the results
  new_date <- as.Date(NA)
  #Check for values that have only numbers in them (excel dates)
  inds <- grepl('^\d+$', x)
  #Change excel date to date class
  new_date[inds] <- as.Date(as.numeric(x[inds]), origin = '1899-12-30')
  #Change remaining ones to date class using as.Date
  new_date[!inds] <- as.Date(x[!inds], '%d/%m/%Y')
  #Return output. 
  new_date
}

data$date_clean <- change_mix_date(data$date_first)

#                   date_all date_first date_clean
#1                      <NA>       <NA>       <NA>
#2                     38169      38169 2004-07-01
#3                     37926      37926 2003-11-01
#4 01/03/2004 --- 01/03/2004 01/03/2004 2004-03-01