如何在 R 中为 3 个字母的 tz 指定 POSIX（时间）格式，以便忽略它？

Question

对于输出，规范是%Z（参见?strptime）。但是对于输入，它是如何工作的呢？

澄清一下，as.POSIXct()将时区缩写解析为有用的信息会很好，但更核心的问题是如何让函数至少忽略时区.

这是我最好的解决方法，但是是否有特定的格式代码可以传递给 as.POSIXct()，适用于所有时区？

times <- c("Fri Jul 03 00:15:00 EDT 2015", "Fri Jul 03 00:15:00 GMT 2015")
as.POSIXct(times, format="%a %b %d %H:%M:%S %Z %Y") # nope! strptime can't handle %Z in input

formats <- paste("%a %b %d %H:%M:%S", gsub(".+ ([A-Z]{3}) [0-9]{4}$", "\1", times),"%Y")
as.POSIXct(times, format=formats) # works

编辑：这是最后一行的输出，以及它的 class（来自单独的调用）；输出符合预期。从控制台：

> as.POSIXct(times, format=formats)
[1] "2015-07-03 00:15:00 EDT" "2015-07-03 00:15:00 EDT"

> attributes(as.POSIXct(times, format=formats))
$class
[1] "POSIXct" "POSIXt" 

$tzone
[1] ""

Answer 1

简短的回答是，"no, you can't."这些是缩写，不能保证它们能唯一标识特定时区。

例如，"EST"美国或澳大利亚的东部标准时间？ "CST"是美国或澳大利亚的中部标准时间，还是中国标准时间，还是古巴标准时间？

我刚刚注意到您并没有尝试解析时区缩写，您只是想避免它。我不知道有什么方法可以让 strptime 忽略任意字符。我知道它会忽略格式字符串结束后时间的字符表示中的任何内容。例如：

R> # The year is not parsed, so the current year is used
R> as.POSIXct(times, format="%a %b %d %H:%M:%S")
[1] "2015-07-03 00:15:00 UTC" "2015-07-03 00:15:00 UTC"

除此之外，正则表达式是我唯一能想到的解决这个问题的方法。与您的示例不同，我会在输入字符向量上使用正则表达式来删除所有 3-5 个字符的时区缩写。

R> times_no_tz <- gsub(" [[:upper:]]{3,5} ", " ", times)
R> as.POSIXct(times_no_tz, format="%a %b %d %H:%M:%S %Y")
[1] "2015-07-03 00:15:00 UTC" "2015-07-03 00:15:00 UTC"

如何在 R 中为 3 个字母的 tz 指定 POSIX（时间）格式，以便忽略它？

How do I specify POSIX (time) format for 3 letter tz in R, in order to ignore it?

timezone

posix

r

strptime