如何创建使用应用(或创建函数)将字符日期转换为 R 中跨多列日期的日期

How To Create a Use Apply (or create a function) To Turn Character Dates Into Dates In R Across Multiple Columns of Dates

所以我有一个 excel sheet 全是字符形式的日期。我实际上不能使用 mdy()as.Date() 来转换原始文件。我创建了一种方法来转换一列中的日期,我想我需要使用 apply()sapply() 函数来转换其他列中的其余日期。唯一的问题是我不知道该怎么做。

虽然只是使用 mdy()as.Date(),但它可以处理我创建的假数据,但不能处理我的原始数据。它吐出的都是NA。我无法完美地重现 excel sheet 上给出的内容,但在下面我创建了一些模拟数据。我想要做的就是将该方法应用于充满日期的数据框的所有几列。

到目前为止,我的方法是我已经能够将字符日期分成三个单独的列,然后将它们转换为日期。我在一个专栏上练习过,现在我需要将其应用到我的其余专栏中。

下面是我的数据的缩写版本,其中包含随机编造的日期和重命名的列

mock_data <- data.frame(
  Death = (c("Jan 23, 2019", "Feb 23, 1998", "June 3, 2003", "Oct 7, 2007", "Feb 28, 2004", "Apr 19, 2014", "Mar 11, 1988", "Sept 30, 2011")),
  Birth = c("May 11, 2010", "Apr 9, 1999", "Aug 30, 1998", "Jan 08, 2003", "Feb 18, 2001", "Nov 25, 2000", "Oct 31, 2009", "Dec 11, 2011"),
  Wedding = c("June 01, 1981", "May 24, 2018", "Feb 25, 2017", "Dec 25, 2011", "Aug 14, 1967", "July 2, 2003", "Nov 30, 2000", "Feb 2, 2002")
  )

这是我转换数据并将其放入新数据框的四步代码

death_data <- data.frame(
  Death_Month = separate(
    separate(mock_data, col = "Death", into = c("Day_Month", "Year"), sep = "\,"),
    col = "Day_Month",
    into = c("Month", "Day"),
    sep = " ")$Month,
  Death_Day = as.numeric(separate(
    separate(mock_data, col = "Death", into = c("Day_Month", "Year"), sep = "\,"),
    col = "Day_Month",
    into = c("Montth", "Day"),
    sep = " ")$Day),
  Death_Year = as.numeric(separate(mock_data, col = "Death", into = c("Day_Month", "Year"), sep = "\,")$Year)
)

death_data$Death_Date <- paste(death_data$Death_Year, death_data$Death_Month, death_data$Death_Day, sep="-") %>% ymd() %>% as.Date()

dates_data <- data.frame(Death = death_data$Death_Date)

dates_data

最终计划是 cbind() 其他信息列的列,这些列不是我需要的来自原始数据框的日期。它可能不是最有效或最优雅的代码,但它是我能想到的完成它的唯一方法。我的方法适用于一列,此代码不会传递给其他任何人。

我们可以在 mutate 中使用 across 来转换多列。在这里,我们需要将所有列转换为 Date class - mdylubridate

更容易
library(dplyr)
library(lubridate)
mock_data_out <-  mock_data %>%
      mutate(across(everything(), mdy))
mock_data_out
#      Death      Birth    Wedding
#1 2019-01-23 2010-05-11 1981-06-01
#2 1998-02-23 1999-04-09 2018-05-24
#3 2003-06-03 1998-08-30 2017-02-25
#4 2007-10-07 2003-01-08 2011-12-25
#5 2004-02-28 2001-02-18 1967-08-14
#6 2014-04-19 2000-11-25 2003-07-02
#7 1988-03-11 2009-10-31 2000-11-30
#8 2011-09-30 2011-12-11 2002-02-02

或在 base R 中使用 lapplyas.Date

mock_data[] <- lapply(mock_data, as.Date, format = "%b %d, %Y")

使用 data.table 及其 IDate 格式的解决方案。

library(data.table)
# I modified a little mock_data to change "Sept" to "Sep" so R will recognize it as September

mock_data <- data.frame(
  Death = (c("Jan 23, 2019", "Feb 23, 1998", "June 3, 2003", "Oct 7, 2007", "Feb 28, 2004", "Apr 19, 2014", "Mar 11, 1988", "Sep 30, 2011")),
  Birth = c("May 11, 2010", "Apr 9, 1999", "Aug 30, 1998", "Jan 08, 2003", "Feb 18, 2001", "Nov 25, 2000", "Oct 31, 2009", "Dec 11, 2011"),
  Wedding = c("June 01, 1981", "May 24, 2018", "Feb 25, 2017", "Dec 25, 2011", "Aug 14, 1967", "July 2, 2003", "Nov 30, 2000", "Feb 2, 2002")
)

setDT(mock_data) # converting into a data.table object
#Now lapply-ing and transforming to data.table IDate format
mock_data[,lapply(.SD, function(x) as.IDate(x, format = "%b %d, %Y"))] #.SD is special symbol representing all columns in mock_data 

#output
mock_data
        Death      Birth    Wedding
1: 2019-01-23 2010-05-11 1981-06-01
2: 1998-02-23 1999-04-09 2018-05-24
3: 2003-06-03 1998-08-30 2017-02-25
4: 2007-10-07 2003-01-08 2011-12-25
5: 2004-02-28 2001-02-18 1967-08-14
6: 2014-04-19 2000-11-25 2003-07-02
7: 1988-03-11 2009-10-31 2000-11-30
8: 2011-09-30 2011-12-11 2002-02-02