转换日期时间字符串并按日期排序

convert datetime strings and order by date

我有一个非常重要的问题,即在 R 中将表示日期时间的字符串获取到 R 可以理解的对象中(POSIXct?)。

我有一个日期时间的字符向量,如下所示:

 [1] "Thu Apr 19 00:42:24 +0000 2018" "Sat Apr 14 03:08:30 +0000 2018" "Thu Apr 02 12:42:07 +0000 2015"
 [4] "Wed Apr 25 02:24:49 +0000 2018" "Sun Apr 03 00:37:19 +0000 2016" "Fri Apr 11 10:02:42 +0000 2014"
 [7] "Tue Jan 09 13:57:33 +0000 2018" "Wed Apr 13 09:45:05 +0000 2016" "Thu May 18 11:26:10 +0000 2017"
[10] "Thu Oct 05 03:41:32 +0000 2017"

我的目标是对这些值进行排序,使最近的日期在顶部,最早的日期在底部。据我所知,这将涉及将这些字符串转换为日期时间对象,但即使是这一步我也没有开始工作。

我试过:

lubridate::as_date(dates[1], tz = "UTC", format = NULL)
as.POSIXct(dates[1], tz = "UTC")

但我总是收到以下错误:

Error in as.POSIXlt.character(x, tz, ...) : 
character string is not in a standard unambiguous format

我想我可以通过指定 format 参数来解决这个问题,但我该怎么做呢? 此外,一旦我转换了它们(或者,如果我不需要转换它们,则不这样做)- 我该如何对这些日期进行排序?

如有任何帮助,我们将不胜感激, 提前致谢!

这是一种删除无关的 +0000 并使用正则表达式将年份移动到月份和日期旁边的方法,然后使用 lubridate 的解析器获得所需的输出.可能如果你更喜欢正则表达式而不是记住 strptime 代码......

library(stringr)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date
dates <- c(
  "Thu Apr 19 00:42:24 +0000 2018", "Sat Apr 14 03:08:30 +0000 2018",
  "Thu Apr 02 12:42:07 +0000 2015", "Wed Apr 25 02:24:49 +0000 2018",
  "Sun Apr 03 00:37:19 +0000 2016", "Fri Apr 11 10:02:42 +0000 2014",
  "Tue Jan 09 13:57:33 +0000 2018", "Wed Apr 13 09:45:05 +0000 2016",
  "Thu May 18 11:26:10 +0000 2017", "Thu Oct 05 03:41:32 +0000 2017"
)

dates %>%
  str_replace_all("(^.{4})(.{6} )(.{8})( \+0000 )(\d{4})$", "\2\5 \3") %>%
  mdy_hms()
#>  [1] "2018-04-19 00:42:24 UTC" "2018-04-14 03:08:30 UTC"
#>  [3] "2015-04-02 12:42:07 UTC" "2018-04-25 02:24:49 UTC"
#>  [5] "2016-04-03 00:37:19 UTC" "2014-04-11 10:02:42 UTC"
#>  [7] "2018-01-09 13:57:33 UTC" "2016-04-13 09:45:05 UTC"
#>  [9] "2017-05-18 11:26:10 UTC" "2017-10-05 03:41:32 UTC"

reprex package (v0.2.0) 创建于 2018-07-27。

或者我们可以使用 order(as.Date()).

> dt[order(as.Date(dt, format="%a %b %d %H:%M:%S %z %Y"))]
 [1] "Fri Apr 11 10:02:42 +0000 2014" "Thu Apr 02 12:42:07 +0000 2015" "Sun Apr 03 00:37:19 +0000 2016"
 [4] "Wed Apr 13 09:45:05 +0000 2016" "Thu May 18 11:26:10 +0000 2017" "Thu Oct 05 03:41:32 +0000 2017"
 [7] "Tue Jan 09 13:57:33 +0000 2018" "Sat Apr 14 03:08:30 +0000 2018" "Thu Apr 19 00:42:24 +0000 2018"
[10] "Wed Apr 25 02:24:49 +0000 2018"

数据

dt <- c("Thu Apr 19 00:42:24 +0000 2018", "Sat Apr 14 03:08:30 +0000 2018" ,
        "Thu Apr 02 12:42:07 +0000 2015", "Wed Apr 25 02:24:49 +0000 2018", 
        "Sun Apr 03 00:37:19 +0000 2016", "Fri Apr 11 10:02:42 +0000 2014",
        "Tue Jan 09 13:57:33 +0000 2018" ,"Wed Apr 13 09:45:05 +0000 2016" ,
        "Thu May 18 11:26:10 +0000 2017","Thu Oct 05 03:41:32 +0000 2017")