R 中的新数据转换问题
A new data conversion issue in R
我有一个包含两种日期格式值的数据列,数字值(例如“38169”)和字符串值(例如“01/03/2004”,格式始终为“%d” /%我的”)。我无法以相同的日期格式转换它们并以标准日期格式“%Y-%m-%d”重新转换它们。
对于下面的例子,我想把变量date_first转换成变量date_clean
附加信息:
数据库是从 Excel 导入的。
以太日期格式是 excel 和 R
中发生字符串更改的结果
data <- data.frame(date_all=c(NA,"38169","37926","01/03/2004 --- 01/03/2004"),
date_first=c(NA,"38169","37926","01/03/2004"))
desidered_data <- data.frame(date_all=c(NA,"38169","37926","01/03/2004 --- 01/03/2004"),
date_first=c(NA,"38169","37926","01/03/2004"),
date_clean=c(NA,2004-07-01,2003-11-01,2004-03-01))
> desidered_data
date_all date_first date_clean
1 <NA> <NA> <NA>
2 38169 38169 2004-07-01
3 37926 37926 2003-11-01
4 01/03/2004 --- 01/03/2004 01/03/2004 2004-03-01
您可以为此使用 case_when
:
data %>%
mutate(date_clean = case_when(grepl("\d{5}", date_first) ~ as.Date(as.numeric(date_first), origin = "1899-12-30"),
TRUE ~ as.Date(date_first, format = "%d/%m/%Y")))
基础 R 选项 -
change_mix_date <- function(x) {
#empty date vector to store the results
new_date <- as.Date(NA)
#Check for values that have only numbers in them (excel dates)
inds <- grepl('^\d+$', x)
#Change excel date to date class
new_date[inds] <- as.Date(as.numeric(x[inds]), origin = '1899-12-30')
#Change remaining ones to date class using as.Date
new_date[!inds] <- as.Date(x[!inds], '%d/%m/%Y')
#Return output.
new_date
}
data$date_clean <- change_mix_date(data$date_first)
# date_all date_first date_clean
#1 <NA> <NA> <NA>
#2 38169 38169 2004-07-01
#3 37926 37926 2003-11-01
#4 01/03/2004 --- 01/03/2004 01/03/2004 2004-03-01
我有一个包含两种日期格式值的数据列,数字值(例如“38169”)和字符串值(例如“01/03/2004”,格式始终为“%d” /%我的”)。我无法以相同的日期格式转换它们并以标准日期格式“%Y-%m-%d”重新转换它们。
对于下面的例子,我想把变量date_first转换成变量date_clean
附加信息:
数据库是从 Excel 导入的。 以太日期格式是 excel 和 R
中发生字符串更改的结果data <- data.frame(date_all=c(NA,"38169","37926","01/03/2004 --- 01/03/2004"),
date_first=c(NA,"38169","37926","01/03/2004"))
desidered_data <- data.frame(date_all=c(NA,"38169","37926","01/03/2004 --- 01/03/2004"),
date_first=c(NA,"38169","37926","01/03/2004"),
date_clean=c(NA,2004-07-01,2003-11-01,2004-03-01))
> desidered_data
date_all date_first date_clean
1 <NA> <NA> <NA>
2 38169 38169 2004-07-01
3 37926 37926 2003-11-01
4 01/03/2004 --- 01/03/2004 01/03/2004 2004-03-01
您可以为此使用 case_when
:
data %>%
mutate(date_clean = case_when(grepl("\d{5}", date_first) ~ as.Date(as.numeric(date_first), origin = "1899-12-30"),
TRUE ~ as.Date(date_first, format = "%d/%m/%Y")))
基础 R 选项 -
change_mix_date <- function(x) {
#empty date vector to store the results
new_date <- as.Date(NA)
#Check for values that have only numbers in them (excel dates)
inds <- grepl('^\d+$', x)
#Change excel date to date class
new_date[inds] <- as.Date(as.numeric(x[inds]), origin = '1899-12-30')
#Change remaining ones to date class using as.Date
new_date[!inds] <- as.Date(x[!inds], '%d/%m/%Y')
#Return output.
new_date
}
data$date_clean <- change_mix_date(data$date_first)
# date_all date_first date_clean
#1 <NA> <NA> <NA>
#2 38169 38169 2004-07-01
#3 37926 37926 2003-11-01
#4 01/03/2004 --- 01/03/2004 01/03/2004 2004-03-01