无法以 "dd/mm/yyyy" 或 "dd/m/yyyy" 格式从数据框的 "Date" 列的三个单独列中提取日期、月份和年份

unable to extract date, month and year in three separate columns from dataframe's "Date" column in the format "dd/mm/yyyy" or "dd/m/yyyy"

我正在尝试使用

library(dplyr)
library(tidyr)
library(stringr)

# Dataframe has "Date" column and date in the format "dd/mm/yyyy" or "dd/m/yyyy"
df <- data.frame(Date = c("10/1/2001", "15/01/2010", "15/2/2010", "20/02/2010", "25/3/2010", "31/03/2010"))

# extract into three columns
df %>% extract(Date, c("Day", "Month", "Year"), "([^/]+), ([^/]+), ([^)]+)")

但上面的代码正在返回:

   Day Month Year
1 <NA>  <NA> <NA>
2 <NA>  <NA> <NA>
3 <NA>  <NA> <NA>
4 <NA>  <NA> <NA>
5 <NA>  <NA> <NA>
6 <NA>  <NA> <NA>

如何按预期正确提取结果中的日期:

   Day Month Year
1 10  1 2010
2 15  1 2010
3 15  2 2010
4 20  2 2010
5 25  3 2010
6 31  3 2010

您的正则表达式模式已关闭。使用此版本:

df %>% extract(Date, c("Day", "Month", "Year"), "(\d+)/(\d+)/(\d+)")

在这种情况下separate可能更容易使用

df %>% 
  separate("Date", into=c("Day","Month","Year"), sep="/") %>% 
  mutate(Month=str_replace(Month, "^0",""))

这会将所有内容保留为字符值。如果您希望值是数字,请使用

df %>% 
  separate("Date", into=c("Day","Month","Year"), sep="/", convert=TRUE)

我们可以使用 lubridate:

library(lubridate)
library(dplyr)
df %>% 
    mutate(Date = dmy(Date), # if your Date column is character type
           across(Date, funs(year, month, day)))
        Date Date_year Date_month Date_day
1 2001-01-10      2001          1       10
2 2010-01-15      2010          1       15
3 2010-02-15      2010          2       15
4 2010-02-20      2010          2       20
5 2010-03-25      2010          3       25
6 2010-03-31      2010          3       31

我们可以使用 read.table 来自 base R

read.table(text = df$Date, sep="/", header = FALSE, 
     col.names = c("Day", "Month", "Year"))
  Day Month Year
1  10     1 2001
2  15     1 2010
3  15     2 2010
4  20     2 2010
5  25     3 2010
6  31     3 2010