转换日期时间字符串并按日期排序
convert datetime strings and order by date
我有一个非常重要的问题,即在 R 中将表示日期时间的字符串获取到 R 可以理解的对象中(POSIXct?)。
我有一个日期时间的字符向量,如下所示:
[1] "Thu Apr 19 00:42:24 +0000 2018" "Sat Apr 14 03:08:30 +0000 2018" "Thu Apr 02 12:42:07 +0000 2015"
[4] "Wed Apr 25 02:24:49 +0000 2018" "Sun Apr 03 00:37:19 +0000 2016" "Fri Apr 11 10:02:42 +0000 2014"
[7] "Tue Jan 09 13:57:33 +0000 2018" "Wed Apr 13 09:45:05 +0000 2016" "Thu May 18 11:26:10 +0000 2017"
[10] "Thu Oct 05 03:41:32 +0000 2017"
我的目标是对这些值进行排序,使最近的日期在顶部,最早的日期在底部。据我所知,这将涉及将这些字符串转换为日期时间对象,但即使是这一步我也没有开始工作。
我试过:
lubridate::as_date(dates[1], tz = "UTC", format = NULL)
as.POSIXct(dates[1], tz = "UTC")
但我总是收到以下错误:
Error in as.POSIXlt.character(x, tz, ...) :
character string is not in a standard unambiguous format
我想我可以通过指定 format
参数来解决这个问题,但我该怎么做呢?
此外,一旦我转换了它们(或者,如果我不需要转换它们,则不这样做)- 我该如何对这些日期进行排序?
如有任何帮助,我们将不胜感激,
提前致谢!
这是一种删除无关的 +0000
并使用正则表达式将年份移动到月份和日期旁边的方法,然后使用 lubridate
的解析器获得所需的输出.可能如果你更喜欢正则表达式而不是记住 strptime
代码......
library(stringr)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#>
#> date
dates <- c(
"Thu Apr 19 00:42:24 +0000 2018", "Sat Apr 14 03:08:30 +0000 2018",
"Thu Apr 02 12:42:07 +0000 2015", "Wed Apr 25 02:24:49 +0000 2018",
"Sun Apr 03 00:37:19 +0000 2016", "Fri Apr 11 10:02:42 +0000 2014",
"Tue Jan 09 13:57:33 +0000 2018", "Wed Apr 13 09:45:05 +0000 2016",
"Thu May 18 11:26:10 +0000 2017", "Thu Oct 05 03:41:32 +0000 2017"
)
dates %>%
str_replace_all("(^.{4})(.{6} )(.{8})( \+0000 )(\d{4})$", "\2\5 \3") %>%
mdy_hms()
#> [1] "2018-04-19 00:42:24 UTC" "2018-04-14 03:08:30 UTC"
#> [3] "2015-04-02 12:42:07 UTC" "2018-04-25 02:24:49 UTC"
#> [5] "2016-04-03 00:37:19 UTC" "2014-04-11 10:02:42 UTC"
#> [7] "2018-01-09 13:57:33 UTC" "2016-04-13 09:45:05 UTC"
#> [9] "2017-05-18 11:26:10 UTC" "2017-10-05 03:41:32 UTC"
由 reprex package (v0.2.0) 创建于 2018-07-27。
或者我们可以使用 order(as.Date())
.
> dt[order(as.Date(dt, format="%a %b %d %H:%M:%S %z %Y"))]
[1] "Fri Apr 11 10:02:42 +0000 2014" "Thu Apr 02 12:42:07 +0000 2015" "Sun Apr 03 00:37:19 +0000 2016"
[4] "Wed Apr 13 09:45:05 +0000 2016" "Thu May 18 11:26:10 +0000 2017" "Thu Oct 05 03:41:32 +0000 2017"
[7] "Tue Jan 09 13:57:33 +0000 2018" "Sat Apr 14 03:08:30 +0000 2018" "Thu Apr 19 00:42:24 +0000 2018"
[10] "Wed Apr 25 02:24:49 +0000 2018"
数据
dt <- c("Thu Apr 19 00:42:24 +0000 2018", "Sat Apr 14 03:08:30 +0000 2018" ,
"Thu Apr 02 12:42:07 +0000 2015", "Wed Apr 25 02:24:49 +0000 2018",
"Sun Apr 03 00:37:19 +0000 2016", "Fri Apr 11 10:02:42 +0000 2014",
"Tue Jan 09 13:57:33 +0000 2018" ,"Wed Apr 13 09:45:05 +0000 2016" ,
"Thu May 18 11:26:10 +0000 2017","Thu Oct 05 03:41:32 +0000 2017")
我有一个非常重要的问题,即在 R 中将表示日期时间的字符串获取到 R 可以理解的对象中(POSIXct?)。
我有一个日期时间的字符向量,如下所示:
[1] "Thu Apr 19 00:42:24 +0000 2018" "Sat Apr 14 03:08:30 +0000 2018" "Thu Apr 02 12:42:07 +0000 2015"
[4] "Wed Apr 25 02:24:49 +0000 2018" "Sun Apr 03 00:37:19 +0000 2016" "Fri Apr 11 10:02:42 +0000 2014"
[7] "Tue Jan 09 13:57:33 +0000 2018" "Wed Apr 13 09:45:05 +0000 2016" "Thu May 18 11:26:10 +0000 2017"
[10] "Thu Oct 05 03:41:32 +0000 2017"
我的目标是对这些值进行排序,使最近的日期在顶部,最早的日期在底部。据我所知,这将涉及将这些字符串转换为日期时间对象,但即使是这一步我也没有开始工作。
我试过:
lubridate::as_date(dates[1], tz = "UTC", format = NULL)
as.POSIXct(dates[1], tz = "UTC")
但我总是收到以下错误:
Error in as.POSIXlt.character(x, tz, ...) :
character string is not in a standard unambiguous format
我想我可以通过指定 format
参数来解决这个问题,但我该怎么做呢?
此外,一旦我转换了它们(或者,如果我不需要转换它们,则不这样做)- 我该如何对这些日期进行排序?
如有任何帮助,我们将不胜感激, 提前致谢!
这是一种删除无关的 +0000
并使用正则表达式将年份移动到月份和日期旁边的方法,然后使用 lubridate
的解析器获得所需的输出.可能如果你更喜欢正则表达式而不是记住 strptime
代码......
library(stringr)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#>
#> date
dates <- c(
"Thu Apr 19 00:42:24 +0000 2018", "Sat Apr 14 03:08:30 +0000 2018",
"Thu Apr 02 12:42:07 +0000 2015", "Wed Apr 25 02:24:49 +0000 2018",
"Sun Apr 03 00:37:19 +0000 2016", "Fri Apr 11 10:02:42 +0000 2014",
"Tue Jan 09 13:57:33 +0000 2018", "Wed Apr 13 09:45:05 +0000 2016",
"Thu May 18 11:26:10 +0000 2017", "Thu Oct 05 03:41:32 +0000 2017"
)
dates %>%
str_replace_all("(^.{4})(.{6} )(.{8})( \+0000 )(\d{4})$", "\2\5 \3") %>%
mdy_hms()
#> [1] "2018-04-19 00:42:24 UTC" "2018-04-14 03:08:30 UTC"
#> [3] "2015-04-02 12:42:07 UTC" "2018-04-25 02:24:49 UTC"
#> [5] "2016-04-03 00:37:19 UTC" "2014-04-11 10:02:42 UTC"
#> [7] "2018-01-09 13:57:33 UTC" "2016-04-13 09:45:05 UTC"
#> [9] "2017-05-18 11:26:10 UTC" "2017-10-05 03:41:32 UTC"
由 reprex package (v0.2.0) 创建于 2018-07-27。
或者我们可以使用 order(as.Date())
.
> dt[order(as.Date(dt, format="%a %b %d %H:%M:%S %z %Y"))]
[1] "Fri Apr 11 10:02:42 +0000 2014" "Thu Apr 02 12:42:07 +0000 2015" "Sun Apr 03 00:37:19 +0000 2016"
[4] "Wed Apr 13 09:45:05 +0000 2016" "Thu May 18 11:26:10 +0000 2017" "Thu Oct 05 03:41:32 +0000 2017"
[7] "Tue Jan 09 13:57:33 +0000 2018" "Sat Apr 14 03:08:30 +0000 2018" "Thu Apr 19 00:42:24 +0000 2018"
[10] "Wed Apr 25 02:24:49 +0000 2018"
数据
dt <- c("Thu Apr 19 00:42:24 +0000 2018", "Sat Apr 14 03:08:30 +0000 2018" ,
"Thu Apr 02 12:42:07 +0000 2015", "Wed Apr 25 02:24:49 +0000 2018",
"Sun Apr 03 00:37:19 +0000 2016", "Fri Apr 11 10:02:42 +0000 2014",
"Tue Jan 09 13:57:33 +0000 2018" ,"Wed Apr 13 09:45:05 +0000 2016" ,
"Thu May 18 11:26:10 +0000 2017","Thu Oct 05 03:41:32 +0000 2017")