如何使用 read.csv 导入 4 位数而不是 2 位数的年份值

Question

我需要使用 read.csv 导入日期。日期在 csv 文件中采用 "dd-mm-yyyy" 格式。我在下面附加了示例数据。

唯一 ID DOB

01-04-1984
24-08-1904
2006 年 12 月 12 日
05-05-1870

Read.csv 正在将日期转换为 "dd-mm-yy" 格式，即使我将日期作为字符导入也是如此。我需要它来导入所有 4 位数年份。

我的代码和结果是：

x <- read.csv("file", header=TRUE,colClasses =c("DOB"="character"))

我也试过：

x <- read.csv("file", header=TRUE, stringsAsFactors = FALSE)

两者的结果：

唯一 ID DOB

01-04-84
24-08-04
12-12-06
05-08-70

> class(x$DOB)
[1] "character"

当我在上面使用 as.Date 函数时，我得到错误值：

> as.Date(dob$DOB, format="%d-%m-%y")  
[1] "01-04-1984" "24-08-2004" "12-12-2006" "05-08-1970"

我读到 as.Date 函数会自动将 00 到 68 之间的年份转换为 21 世纪，将 69 到 99 之间的年份转换为 20 世纪。

因此，我认为我在 read.csv 函数本身中犯了一个错误。

Answer 1

我还没有想出在一行中实现你想要的东西的方法，但如果你能负担得起将任务分成两行，那么试试这个：

library(dplyr) # data frame operations
library(lubridate) # tidyverse-compliant package for operations on dates

x <- read.csv("file.csv", header=TRUE, stringsAsFactors=FALSE)
x <- x %>% mutate(DOB = as.Date(DOB, format="%d-%m-%Y"))
x %>% mutate(year = lubridate::year(DOB)) # just to verify that the operations on dates work as expected
#   UniqueID        DOB year
# 1        1 1984-04-01 1984
# 2        2 1904-08-24 1904
# 3        3 2006-12-12 2006
# 4        4 1870-05-05 1870

如何使用 read.csv 导入 4 位数而不是 2 位数的年份值

How to import 4-digit year value instead of 2-digit using read.csv

r

date

as.date

read.csv