如何按模式识别列并使用 R 将其转换为日期时间?

How to identify columns by pattern and convert it to datetime using R?

我有一个名为 data 的数据框,如下所示

 structure(list(NRIC_ID = c(1234L, 4567L, 1234L, 3578L, 2468L), 
    ADMIT_DATE = structure(c(2L, 3L, 4L, 5L, 1L), .Label = c("11/11/2011", 
    "2/12/2016", "3/11/2019", "5/7/2018", "7/7/2014"), class = "factor"), 
    test_date = structure(c(3L, 5L, 2L, 1L, 4L), .Label = c(" 2014-17-11", 
    "10/8/2013", "11/2/2012", "12/12/2012", "12/2/2014"), class = "factor"), 
    test2_DATE = structure(c(2L, 3L, 4L, 5L, 1L), .Label = c("11/11/2011", 
    "2/12/2016", "3/11/2019", "5/7/2018", "7/7/2014"), class = "factor")), class = "data.frame", row.names = c(NA, 
-5L))

目前,当我使用以下代码获取列的数据类型时

(sapply(data, class))

我得到的输出为 IntegerDatecharacter

现在,我想使用模式匹配识别列名称中包含 DATEdate 的所有列,并将它们转换为 Date 数据类型

我尝试了下面的模式匹配代码

data %>%
  select(contains("date")) %>%  #how to ignore case here
  as.Date()

当我这样做时,出现以下错误

Error in as.Date.default: do not know how to convert '.' to class "Date"

可以帮我解决这个问题吗?

如果要对列进行操作,请使用mutate

library(dplyr)
df %>% 
  mutate(across(contains('date', ignore.case = TRUE), 
          lubridate::parse_date_time, c('dmY', 'mdY', 'Ydm')))

#  NRIC_ID ADMIT_DATE  test_date test2_DATE
#1    1234 2016-12-02 2012-02-11 2016-12-02
#2    4567 2019-11-03 2014-02-12 2019-11-03
#3    1234 2018-07-05 2013-08-10 2018-07-05
#4    3578 2014-07-07 2014-11-17 2014-07-07
#5    2468 2011-11-11 2012-12-12 2011-11-11

使用data.table和解决方案@Ronak Shah

library(data.table)
library(lubridate)

dt <- as.data.table(df)
cols_date <- grep(pattern = "date", x = names(dt), ignore.case = T, value = T)
dt[, lapply(.SD, parse_date_time, c('dmY', 'mdY', 'Ydm')), .SDcols = cols_date]

#>    ADMIT_DATE  test_date test2_DATE
#> 1: 2016-02-12 2012-02-11 2016-12-02
#> 2: 2019-03-11 2014-02-12 2019-11-03
#> 3: 2018-05-07 2013-08-10 2018-07-05
#> 4: 2014-07-07 2014-11-17 2014-07-07
#> 5: 2011-11-11 2012-12-12 2011-11-11

reprex package (v0.3.0)

于 2020-12-23 创建

我们可以使用 anytime

中的 anydate
library(dplyr)
library(anytime)
addFormats("%Y-%d-%m")
df %>%
    mutate(across(ends_with('DATE'), anydate))

-输出

#   NRIC_ID ADMIT_DATE  test_date test2_DATE
#1    1234 2016-02-12 2012-11-02 2016-02-12
#2    4567 2019-03-11 2014-12-02 2019-03-11
#3    1234 2018-05-07 2013-10-08 2018-05-07
#4    3578 2014-07-07 2014-11-17 2014-07-07
#5    2468 2011-11-11 2012-12-12 2011-11-11