转换为 POSIXct 时出错:某些日期 return NA
Error when converting to POSIXct: some dates return NA
我有这个 df:
df <- structure(list(date = structure(c(4L, 6L, 7L, 8L, 9L, 10L, 11L,
1L, 2L, 5L, 3L), .Label = c("2018-03-24 00:24:14", "2018-03-24 00:54:00",
"2018-03-24 12:19:00", "2018-03-24 14:04:01", "2018-03-24 17:12:35",
"2018-03-24 18:58:57", "2018-03-24 20:48:50", "2018-03-24 21:37:42",
"2018-03-25 01:55:40", "2018-03-25 02:47:58", "2018-03-25 03:35:11"
), class = "factor")), row.names = c(NA, -11L), class = "data.frame")
我想将日期转换为 POSIXct:
df <- df %>%
mutate(date=as.POSIXct(date, format="%Y-%m-%d %H:%M:%OS"))
似乎有效:
class(df$date)
> class(df$date)
[1] "POSIXct" "POSIXt"
但是...如您所见,一个日期返回了 NA:
df
date
1 2018-03-24 14:04:01
2 2018-03-24 18:58:57
3 2018-03-24 20:48:50
4 2018-03-24 21:37:42
5 2018-03-25 01:55:40
6 <NA>
7 2018-03-25 03:35:11
8 2018-03-24 00:24:14
9 2018-03-24 00:54:00
10 2018-03-24 17:12:35
11 2018-03-24 12:19:00
为什么?
Session 信息:
> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)
Matrix products: default
locale:
[1] LC_COLLATE=English_Switzerland.1252 LC_CTYPE=English_Switzerland.1252
[3] LC_MONETARY=English_Switzerland.1252 LC_NUMERIC=C
[5] LC_TIME=English_Switzerland.1252
谢谢
正如评论中提到的 @DirkEddelbuettel 这是一个夏令时问题。
df$date
# [1] "2018-03-24 14:04:01 CET"
# [2] "2018-03-24 18:58:57 CET"
# [3] "2018-03-24 20:48:50 CET"
# [4] "2018-03-24 21:37:42 CET"
# [5] "2018-03-25 01:55:40 CET"
# [6] "2018-03-25 02:47:58" ##
# [7] "2018-03-25 03:35:11 CEST"
# [8] "2018-03-24 00:24:14 CET"
# [9] "2018-03-24 00:54:00 CET"
# [10] "2018-03-24 17:12:35 CET"
# [11] "2018-03-24 12:19:00 CET"
as.POSIXct
似乎正确地拒绝了第六次转换,因为它可能根本不存在。
as.POSIXct("2018-03-25 02:47:58", format="%Y-%m-%d %H:%M:%S")
# [1] NA
如果您仍然想使用时间,可以使用 strptime
。
strptime("2018-03-25 02:47:58", format="%Y-%m-%d %H:%M:%S")
# [1] "2018-03-25 02:47:58"
整个事情:
df <- transform(df, date=strptime(df$date, format="%Y-%m-%d %H:%M:%S"))
df
# date
# 1 2018-03-24 14:04:01
# 2 2018-03-24 18:58:57
# 3 2018-03-24 20:48:50
# 4 2018-03-24 21:37:42
# 5 2018-03-25 01:55:40
# 6 2018-03-25 02:47:58
# 7 2018-03-25 03:35:11
# 8 2018-03-24 00:24:14
# 9 2018-03-24 00:54:00
# 10 2018-03-24 17:12:35
# 11 2018-03-24 12:19:00
str(df)
# 'data.frame': 11 obs. of 1 variable:
# $ date: POSIXlt, format: ...
也可能与 dplyr
:
df %>% mutate(df, date=strptime(df$date, format="%Y-%m-%d %H:%M:%S"))
我有这个 df:
df <- structure(list(date = structure(c(4L, 6L, 7L, 8L, 9L, 10L, 11L,
1L, 2L, 5L, 3L), .Label = c("2018-03-24 00:24:14", "2018-03-24 00:54:00",
"2018-03-24 12:19:00", "2018-03-24 14:04:01", "2018-03-24 17:12:35",
"2018-03-24 18:58:57", "2018-03-24 20:48:50", "2018-03-24 21:37:42",
"2018-03-25 01:55:40", "2018-03-25 02:47:58", "2018-03-25 03:35:11"
), class = "factor")), row.names = c(NA, -11L), class = "data.frame")
我想将日期转换为 POSIXct:
df <- df %>%
mutate(date=as.POSIXct(date, format="%Y-%m-%d %H:%M:%OS"))
似乎有效:
class(df$date)
> class(df$date)
[1] "POSIXct" "POSIXt"
但是...如您所见,一个日期返回了 NA:
df
date
1 2018-03-24 14:04:01
2 2018-03-24 18:58:57
3 2018-03-24 20:48:50
4 2018-03-24 21:37:42
5 2018-03-25 01:55:40
6 <NA>
7 2018-03-25 03:35:11
8 2018-03-24 00:24:14
9 2018-03-24 00:54:00
10 2018-03-24 17:12:35
11 2018-03-24 12:19:00
为什么?
Session 信息:
> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)
Matrix products: default
locale:
[1] LC_COLLATE=English_Switzerland.1252 LC_CTYPE=English_Switzerland.1252
[3] LC_MONETARY=English_Switzerland.1252 LC_NUMERIC=C
[5] LC_TIME=English_Switzerland.1252
谢谢
正如评论中提到的 @DirkEddelbuettel 这是一个夏令时问题。
df$date
# [1] "2018-03-24 14:04:01 CET"
# [2] "2018-03-24 18:58:57 CET"
# [3] "2018-03-24 20:48:50 CET"
# [4] "2018-03-24 21:37:42 CET"
# [5] "2018-03-25 01:55:40 CET"
# [6] "2018-03-25 02:47:58" ##
# [7] "2018-03-25 03:35:11 CEST"
# [8] "2018-03-24 00:24:14 CET"
# [9] "2018-03-24 00:54:00 CET"
# [10] "2018-03-24 17:12:35 CET"
# [11] "2018-03-24 12:19:00 CET"
as.POSIXct
似乎正确地拒绝了第六次转换,因为它可能根本不存在。
as.POSIXct("2018-03-25 02:47:58", format="%Y-%m-%d %H:%M:%S")
# [1] NA
如果您仍然想使用时间,可以使用 strptime
。
strptime("2018-03-25 02:47:58", format="%Y-%m-%d %H:%M:%S")
# [1] "2018-03-25 02:47:58"
整个事情:
df <- transform(df, date=strptime(df$date, format="%Y-%m-%d %H:%M:%S"))
df
# date
# 1 2018-03-24 14:04:01
# 2 2018-03-24 18:58:57
# 3 2018-03-24 20:48:50
# 4 2018-03-24 21:37:42
# 5 2018-03-25 01:55:40
# 6 2018-03-25 02:47:58
# 7 2018-03-25 03:35:11
# 8 2018-03-24 00:24:14
# 9 2018-03-24 00:54:00
# 10 2018-03-24 17:12:35
# 11 2018-03-24 12:19:00
str(df)
# 'data.frame': 11 obs. of 1 variable:
# $ date: POSIXlt, format: ...
也可能与 dplyr
:
df %>% mutate(df, date=strptime(df$date, format="%Y-%m-%d %H:%M:%S"))