是否有可靠的方法来检测表示由于 DST 而不存在的时间的 POSIXlt 对象?

Is there a reliable way to detect POSIXlt objects representing a time which does not exist due to DST?

我遇到以下问题:我获取的数据中的日期列包含由于夏令时而不存在的日期。 (例如2015-03-2902:00在欧洲中部时间不存在,因为时钟直接从01:59设置为03:00因为DST在这一天生效)

是否有一种简单可靠的方法来确定日期是否对夏令时有效?

由于日期时间的属性,这并不简单 类。

# generating the invalid time as POSIXlt object
test <- strptime("2015-03-29 02:00", format="%Y-%m-%d %H:%M", tz="CET")

# the object seems to represent something at least partially reasonable, notice the missing timezone specification though
test
# [1] "2015-03-29 02:00:00"

# strangely enough this object is regarded as NA by is.na
is.na(test)
# [1] TRUE

# which is no surprise if you consider:
is.na.POSIXlt
# function (x) 
# is.na(as.POSIXct(x))

as.POSIXct(test)
# [1] NA

# inspecting the interior of my POSIXlt object:
unlist(test)
# sec    min   hour   mday    mon   year   wday   yday  isdst   zone gmtoff
# "0"    "0"    "2"   "29"    "2"  "115"    "0"   "87"   "-1"     ""     NA

所以我想到的最简单的方法是检查POSIXlt对象的isdst字段,POSIXt的帮助描述如下:

isdst
Daylight Saving Time flag. Positive if in force, zero if not, negative if unknown.

正在检查 isdst 字段保存,因为如果日期因夏令时更改而无效,则此字段仅 -1 或其他某些字段可以 -1原因?

有关版本、平台和区域设置的信息

R.version
# _                           
# platform       x86_64-w64-mingw32          
# arch           x86_64                      
# os             mingw32                     
# system         x86_64, mingw32             
# status                                     
# major          3                           
# minor          3.1                         
# year           2016                        
# month          06                          
# day            21                          
# svn rev        70800                       
# language       R                           
# version.string R version 3.3.1 (2016-06-21)
# nickname       Bug in Your Hair            
Sys.getlocale()
# [1] "LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"

手册说 strptime 不验证时间是否存在于特定时区,因为转换 to/from 夏令时 (?strptime)。手册还说 as.POSIXct 执行此验证,因此按照手册,应该检查生成的 POSIXct 对象是否存在 NA (?asPOSIXct),这会将不存在的时间标识为显示在问题示例中。然而,对于在一个时区 (?asPOSIXct) 中存在两次的时间,结果是 OS 特定的:

Remember that in most time zones some times do not occur and some occur twice because of transitions to/from ‘daylight saving’ (also known as ‘summer’) time. strptime does not validate such times (it does not assume a specific time zone), but conversion by as.POSIXct will do so.

One issue is what happens at transitions to and from DST, for example in the UK

as.POSIXct(strptime("2011-03-27 01:30:00", "%Y-%m-%d %H:%M:%S")) as.POSIXct(strptime("2010-10-31 01:30:00", "%Y-%m-%d %H:%M:%S"))

are respectively invalid (the clocks went forward at 1:00 GMT to 2:00 BST) and ambiguous (the clocks went back at 2:00 BST to 1:00 GMT). What happens in such cases is OS-specific: one should expect the first to be ‘NA’, but the second could be interpreted as either BST or GMT (and common OSes give both possible values).

as.POSIXct(test) 的值似乎与平台相关,为获得 可靠的 方法增加了一层复杂性。在我的 windows 机器上,(R 3.3.1),as.POSIXct(test) 生成 NA,正如 OP 所报告的那样。但是,在我的 Linux 平台(相同的 R 版本)上,我得到以下信息:

times = c ("2015-03-29 01:00",
           "2015-03-29 02:00",
           "2015-03-29 03:00")

test <- strptime(times, format="%Y-%m-%d %H:%M", tz="CET")

test
#[1] "2015-03-29 01:00:00 CET"  "2015-03-29 02:00:00 CEST" "2015-03-29 03:00:00 CEST"
as.POSIXct(test)
#[1] "2015-03-29 01:00:00 CET"  "2015-03-29 01:00:00 CET"  "2015-03-29 03:00:00 CEST"
as.character(test)
#[1] "2015-03-29 01:00:00" "2015-03-29 02:00:00" "2015-03-29 03:00:00"
as.character(as.POSIXct(test))
#[1] "2015-03-29 01:00:00" "2015-03-29 01:00:00" "2015-03-29 03:00:00"

我们可以依赖的一件事不是 as.POSIXct(test) 的实际值,而是当 test 无效时它与 test 不同 date/time:

(as.character(test) == as.character(as.POSIXct(test))) %in% TRUE
# TRUE FALSE  TRUE

我不确定 as.character 在这里是否绝对必要,但我将其包括在内只是为了确保我们不会与 POSIX 对象的任何其他奇怪行为发生冲突。