如何创建使用应用(或创建函数)将字符日期转换为 R 中跨多列日期的日期
How To Create a Use Apply (or create a function) To Turn Character Dates Into Dates In R Across Multiple Columns of Dates
所以我有一个 excel sheet 全是字符形式的日期。我实际上不能使用 mdy()
或 as.Date()
来转换原始文件。我创建了一种方法来转换一列中的日期,我想我需要使用 apply()
或 sapply()
函数来转换其他列中的其余日期。唯一的问题是我不知道该怎么做。
虽然只是使用 mdy()
或 as.Date()
,但它可以处理我创建的假数据,但不能处理我的原始数据。它吐出的都是NA。我无法完美地重现 excel sheet 上给出的内容,但在下面我创建了一些模拟数据。我想要做的就是将该方法应用于充满日期的数据框的所有几列。
到目前为止,我的方法是我已经能够将字符日期分成三个单独的列,然后将它们转换为日期。我在一个专栏上练习过,现在我需要将其应用到我的其余专栏中。
下面是我的数据的缩写版本,其中包含随机编造的日期和重命名的列
mock_data <- data.frame(
Death = (c("Jan 23, 2019", "Feb 23, 1998", "June 3, 2003", "Oct 7, 2007", "Feb 28, 2004", "Apr 19, 2014", "Mar 11, 1988", "Sept 30, 2011")),
Birth = c("May 11, 2010", "Apr 9, 1999", "Aug 30, 1998", "Jan 08, 2003", "Feb 18, 2001", "Nov 25, 2000", "Oct 31, 2009", "Dec 11, 2011"),
Wedding = c("June 01, 1981", "May 24, 2018", "Feb 25, 2017", "Dec 25, 2011", "Aug 14, 1967", "July 2, 2003", "Nov 30, 2000", "Feb 2, 2002")
)
这是我转换数据并将其放入新数据框的四步代码
death_data <- data.frame(
Death_Month = separate(
separate(mock_data, col = "Death", into = c("Day_Month", "Year"), sep = "\,"),
col = "Day_Month",
into = c("Month", "Day"),
sep = " ")$Month,
Death_Day = as.numeric(separate(
separate(mock_data, col = "Death", into = c("Day_Month", "Year"), sep = "\,"),
col = "Day_Month",
into = c("Montth", "Day"),
sep = " ")$Day),
Death_Year = as.numeric(separate(mock_data, col = "Death", into = c("Day_Month", "Year"), sep = "\,")$Year)
)
death_data$Death_Date <- paste(death_data$Death_Year, death_data$Death_Month, death_data$Death_Day, sep="-") %>% ymd() %>% as.Date()
dates_data <- data.frame(Death = death_data$Death_Date)
dates_data
最终计划是 cbind()
其他信息列的列,这些列不是我需要的来自原始数据框的日期。它可能不是最有效或最优雅的代码,但它是我能想到的完成它的唯一方法。我的方法适用于一列,此代码不会传递给其他任何人。
我们可以在 mutate
中使用 across
来转换多列。在这里,我们需要将所有列转换为 Date
class - mdy
从 lubridate
更容易
library(dplyr)
library(lubridate)
mock_data_out <- mock_data %>%
mutate(across(everything(), mdy))
mock_data_out
# Death Birth Wedding
#1 2019-01-23 2010-05-11 1981-06-01
#2 1998-02-23 1999-04-09 2018-05-24
#3 2003-06-03 1998-08-30 2017-02-25
#4 2007-10-07 2003-01-08 2011-12-25
#5 2004-02-28 2001-02-18 1967-08-14
#6 2014-04-19 2000-11-25 2003-07-02
#7 1988-03-11 2009-10-31 2000-11-30
#8 2011-09-30 2011-12-11 2002-02-02
或在 base R
中使用 lapply
和 as.Date
mock_data[] <- lapply(mock_data, as.Date, format = "%b %d, %Y")
使用 data.table
及其 IDate
格式的解决方案。
library(data.table)
# I modified a little mock_data to change "Sept" to "Sep" so R will recognize it as September
mock_data <- data.frame(
Death = (c("Jan 23, 2019", "Feb 23, 1998", "June 3, 2003", "Oct 7, 2007", "Feb 28, 2004", "Apr 19, 2014", "Mar 11, 1988", "Sep 30, 2011")),
Birth = c("May 11, 2010", "Apr 9, 1999", "Aug 30, 1998", "Jan 08, 2003", "Feb 18, 2001", "Nov 25, 2000", "Oct 31, 2009", "Dec 11, 2011"),
Wedding = c("June 01, 1981", "May 24, 2018", "Feb 25, 2017", "Dec 25, 2011", "Aug 14, 1967", "July 2, 2003", "Nov 30, 2000", "Feb 2, 2002")
)
setDT(mock_data) # converting into a data.table object
#Now lapply-ing and transforming to data.table IDate format
mock_data[,lapply(.SD, function(x) as.IDate(x, format = "%b %d, %Y"))] #.SD is special symbol representing all columns in mock_data
#output
mock_data
Death Birth Wedding
1: 2019-01-23 2010-05-11 1981-06-01
2: 1998-02-23 1999-04-09 2018-05-24
3: 2003-06-03 1998-08-30 2017-02-25
4: 2007-10-07 2003-01-08 2011-12-25
5: 2004-02-28 2001-02-18 1967-08-14
6: 2014-04-19 2000-11-25 2003-07-02
7: 1988-03-11 2009-10-31 2000-11-30
8: 2011-09-30 2011-12-11 2002-02-02
所以我有一个 excel sheet 全是字符形式的日期。我实际上不能使用 mdy()
或 as.Date()
来转换原始文件。我创建了一种方法来转换一列中的日期,我想我需要使用 apply()
或 sapply()
函数来转换其他列中的其余日期。唯一的问题是我不知道该怎么做。
虽然只是使用 mdy()
或 as.Date()
,但它可以处理我创建的假数据,但不能处理我的原始数据。它吐出的都是NA。我无法完美地重现 excel sheet 上给出的内容,但在下面我创建了一些模拟数据。我想要做的就是将该方法应用于充满日期的数据框的所有几列。
到目前为止,我的方法是我已经能够将字符日期分成三个单独的列,然后将它们转换为日期。我在一个专栏上练习过,现在我需要将其应用到我的其余专栏中。
下面是我的数据的缩写版本,其中包含随机编造的日期和重命名的列
mock_data <- data.frame(
Death = (c("Jan 23, 2019", "Feb 23, 1998", "June 3, 2003", "Oct 7, 2007", "Feb 28, 2004", "Apr 19, 2014", "Mar 11, 1988", "Sept 30, 2011")),
Birth = c("May 11, 2010", "Apr 9, 1999", "Aug 30, 1998", "Jan 08, 2003", "Feb 18, 2001", "Nov 25, 2000", "Oct 31, 2009", "Dec 11, 2011"),
Wedding = c("June 01, 1981", "May 24, 2018", "Feb 25, 2017", "Dec 25, 2011", "Aug 14, 1967", "July 2, 2003", "Nov 30, 2000", "Feb 2, 2002")
)
这是我转换数据并将其放入新数据框的四步代码
death_data <- data.frame(
Death_Month = separate(
separate(mock_data, col = "Death", into = c("Day_Month", "Year"), sep = "\,"),
col = "Day_Month",
into = c("Month", "Day"),
sep = " ")$Month,
Death_Day = as.numeric(separate(
separate(mock_data, col = "Death", into = c("Day_Month", "Year"), sep = "\,"),
col = "Day_Month",
into = c("Montth", "Day"),
sep = " ")$Day),
Death_Year = as.numeric(separate(mock_data, col = "Death", into = c("Day_Month", "Year"), sep = "\,")$Year)
)
death_data$Death_Date <- paste(death_data$Death_Year, death_data$Death_Month, death_data$Death_Day, sep="-") %>% ymd() %>% as.Date()
dates_data <- data.frame(Death = death_data$Death_Date)
dates_data
最终计划是 cbind()
其他信息列的列,这些列不是我需要的来自原始数据框的日期。它可能不是最有效或最优雅的代码,但它是我能想到的完成它的唯一方法。我的方法适用于一列,此代码不会传递给其他任何人。
我们可以在 mutate
中使用 across
来转换多列。在这里,我们需要将所有列转换为 Date
class - mdy
从 lubridate
library(dplyr)
library(lubridate)
mock_data_out <- mock_data %>%
mutate(across(everything(), mdy))
mock_data_out
# Death Birth Wedding
#1 2019-01-23 2010-05-11 1981-06-01
#2 1998-02-23 1999-04-09 2018-05-24
#3 2003-06-03 1998-08-30 2017-02-25
#4 2007-10-07 2003-01-08 2011-12-25
#5 2004-02-28 2001-02-18 1967-08-14
#6 2014-04-19 2000-11-25 2003-07-02
#7 1988-03-11 2009-10-31 2000-11-30
#8 2011-09-30 2011-12-11 2002-02-02
或在 base R
中使用 lapply
和 as.Date
mock_data[] <- lapply(mock_data, as.Date, format = "%b %d, %Y")
使用 data.table
及其 IDate
格式的解决方案。
library(data.table)
# I modified a little mock_data to change "Sept" to "Sep" so R will recognize it as September
mock_data <- data.frame(
Death = (c("Jan 23, 2019", "Feb 23, 1998", "June 3, 2003", "Oct 7, 2007", "Feb 28, 2004", "Apr 19, 2014", "Mar 11, 1988", "Sep 30, 2011")),
Birth = c("May 11, 2010", "Apr 9, 1999", "Aug 30, 1998", "Jan 08, 2003", "Feb 18, 2001", "Nov 25, 2000", "Oct 31, 2009", "Dec 11, 2011"),
Wedding = c("June 01, 1981", "May 24, 2018", "Feb 25, 2017", "Dec 25, 2011", "Aug 14, 1967", "July 2, 2003", "Nov 30, 2000", "Feb 2, 2002")
)
setDT(mock_data) # converting into a data.table object
#Now lapply-ing and transforming to data.table IDate format
mock_data[,lapply(.SD, function(x) as.IDate(x, format = "%b %d, %Y"))] #.SD is special symbol representing all columns in mock_data
#output
mock_data
Death Birth Wedding
1: 2019-01-23 2010-05-11 1981-06-01
2: 1998-02-23 1999-04-09 2018-05-24
3: 2003-06-03 1998-08-30 2017-02-25
4: 2007-10-07 2003-01-08 2011-12-25
5: 2004-02-28 2001-02-18 1967-08-14
6: 2014-04-19 2000-11-25 2003-07-02
7: 1988-03-11 2009-10-31 2000-11-30
8: 2011-09-30 2011-12-11 2002-02-02