如何按模式识别列并使用 R 将其转换为日期时间?
How to identify columns by pattern and convert it to datetime using R?
我有一个名为 data
的数据框,如下所示
structure(list(NRIC_ID = c(1234L, 4567L, 1234L, 3578L, 2468L),
ADMIT_DATE = structure(c(2L, 3L, 4L, 5L, 1L), .Label = c("11/11/2011",
"2/12/2016", "3/11/2019", "5/7/2018", "7/7/2014"), class = "factor"),
test_date = structure(c(3L, 5L, 2L, 1L, 4L), .Label = c(" 2014-17-11",
"10/8/2013", "11/2/2012", "12/12/2012", "12/2/2014"), class = "factor"),
test2_DATE = structure(c(2L, 3L, 4L, 5L, 1L), .Label = c("11/11/2011",
"2/12/2016", "3/11/2019", "5/7/2018", "7/7/2014"), class = "factor")), class = "data.frame", row.names = c(NA,
-5L))
目前,当我使用以下代码获取列的数据类型时
(sapply(data, class))
我得到的输出为 Integer
、Date
和 character
。
现在,我想使用模式匹配识别列名称中包含 DATE
或 date
的所有列,并将它们转换为 Date
数据类型
我尝试了下面的模式匹配代码
data %>%
select(contains("date")) %>% #how to ignore case here
as.Date()
当我这样做时,出现以下错误
Error in as.Date.default: do not know how to convert '.' to class "Date"
可以帮我解决这个问题吗?
如果要对列进行操作,请使用mutate
。
library(dplyr)
df %>%
mutate(across(contains('date', ignore.case = TRUE),
lubridate::parse_date_time, c('dmY', 'mdY', 'Ydm')))
# NRIC_ID ADMIT_DATE test_date test2_DATE
#1 1234 2016-12-02 2012-02-11 2016-12-02
#2 4567 2019-11-03 2014-02-12 2019-11-03
#3 1234 2018-07-05 2013-08-10 2018-07-05
#4 3578 2014-07-07 2014-11-17 2014-07-07
#5 2468 2011-11-11 2012-12-12 2011-11-11
使用data.table
和解决方案@Ronak Shah
library(data.table)
library(lubridate)
dt <- as.data.table(df)
cols_date <- grep(pattern = "date", x = names(dt), ignore.case = T, value = T)
dt[, lapply(.SD, parse_date_time, c('dmY', 'mdY', 'Ydm')), .SDcols = cols_date]
#> ADMIT_DATE test_date test2_DATE
#> 1: 2016-02-12 2012-02-11 2016-12-02
#> 2: 2019-03-11 2014-02-12 2019-11-03
#> 3: 2018-05-07 2013-08-10 2018-07-05
#> 4: 2014-07-07 2014-11-17 2014-07-07
#> 5: 2011-11-11 2012-12-12 2011-11-11
由 reprex package (v0.3.0)
于 2020-12-23 创建
我们可以使用 anytime
中的 anydate
library(dplyr)
library(anytime)
addFormats("%Y-%d-%m")
df %>%
mutate(across(ends_with('DATE'), anydate))
-输出
# NRIC_ID ADMIT_DATE test_date test2_DATE
#1 1234 2016-02-12 2012-11-02 2016-02-12
#2 4567 2019-03-11 2014-12-02 2019-03-11
#3 1234 2018-05-07 2013-10-08 2018-05-07
#4 3578 2014-07-07 2014-11-17 2014-07-07
#5 2468 2011-11-11 2012-12-12 2011-11-11
我有一个名为 data
的数据框,如下所示
structure(list(NRIC_ID = c(1234L, 4567L, 1234L, 3578L, 2468L),
ADMIT_DATE = structure(c(2L, 3L, 4L, 5L, 1L), .Label = c("11/11/2011",
"2/12/2016", "3/11/2019", "5/7/2018", "7/7/2014"), class = "factor"),
test_date = structure(c(3L, 5L, 2L, 1L, 4L), .Label = c(" 2014-17-11",
"10/8/2013", "11/2/2012", "12/12/2012", "12/2/2014"), class = "factor"),
test2_DATE = structure(c(2L, 3L, 4L, 5L, 1L), .Label = c("11/11/2011",
"2/12/2016", "3/11/2019", "5/7/2018", "7/7/2014"), class = "factor")), class = "data.frame", row.names = c(NA,
-5L))
目前,当我使用以下代码获取列的数据类型时
(sapply(data, class))
我得到的输出为 Integer
、Date
和 character
。
现在,我想使用模式匹配识别列名称中包含 DATE
或 date
的所有列,并将它们转换为 Date
数据类型
我尝试了下面的模式匹配代码
data %>%
select(contains("date")) %>% #how to ignore case here
as.Date()
当我这样做时,出现以下错误
Error in as.Date.default: do not know how to convert '.' to class "Date"
可以帮我解决这个问题吗?
如果要对列进行操作,请使用mutate
。
library(dplyr)
df %>%
mutate(across(contains('date', ignore.case = TRUE),
lubridate::parse_date_time, c('dmY', 'mdY', 'Ydm')))
# NRIC_ID ADMIT_DATE test_date test2_DATE
#1 1234 2016-12-02 2012-02-11 2016-12-02
#2 4567 2019-11-03 2014-02-12 2019-11-03
#3 1234 2018-07-05 2013-08-10 2018-07-05
#4 3578 2014-07-07 2014-11-17 2014-07-07
#5 2468 2011-11-11 2012-12-12 2011-11-11
使用data.table
和解决方案@Ronak Shah
library(data.table)
library(lubridate)
dt <- as.data.table(df)
cols_date <- grep(pattern = "date", x = names(dt), ignore.case = T, value = T)
dt[, lapply(.SD, parse_date_time, c('dmY', 'mdY', 'Ydm')), .SDcols = cols_date]
#> ADMIT_DATE test_date test2_DATE
#> 1: 2016-02-12 2012-02-11 2016-12-02
#> 2: 2019-03-11 2014-02-12 2019-11-03
#> 3: 2018-05-07 2013-08-10 2018-07-05
#> 4: 2014-07-07 2014-11-17 2014-07-07
#> 5: 2011-11-11 2012-12-12 2011-11-11
由 reprex package (v0.3.0)
于 2020-12-23 创建我们可以使用 anytime
anydate
library(dplyr)
library(anytime)
addFormats("%Y-%d-%m")
df %>%
mutate(across(ends_with('DATE'), anydate))
-输出
# NRIC_ID ADMIT_DATE test_date test2_DATE
#1 1234 2016-02-12 2012-11-02 2016-02-12
#2 4567 2019-03-11 2014-12-02 2019-03-11
#3 1234 2018-05-07 2013-10-08 2018-05-07
#4 3578 2014-07-07 2014-11-17 2014-07-07
#5 2468 2011-11-11 2012-12-12 2011-11-11