在 R 中强制转换为 data.frame 时,来自日期序列 POSIXct 的时区丢失

timezone from dates squence POSIXct lost when coercing to data.frame in R

我想保留下面生成的数据序列的CET、CEST部分。

seq(as.POSIXct("2018-10-01"), as.POSIXct("2018-10-02"), "hour")
myvector <- seq(as.POSIXct("2018-10-01"), as.POSIXct("2018-10-02"), "hour")
myvector
mydf <- as.data.frame(myvector)

在控制台中看起来像:

> head(seq(...))

[1] "2018-10-01 00:00:00 CEST" "2018-10-01 01:00:00 CEST" "2018-10-01 02:00:00 CEST" "2018-10-01 03:00:00 CEST" "2018-10-01 04:00:00 CEST" "2018-10-01 05:00:00 CEST"

> head(myvector)

[1] "2018-10-01 00:00:00 CEST" "2018-10-01 01:00:00 CEST" "2018-10-01 02:00:00 CEST" "2018-10-01 03:00:00 CEST" "2018-10-01 04:00:00 CEST" "2018-10-01 05:00:00 CEST"

> head(mydf)
             myvector
1 2018-10-01 00:00:00
2 2018-10-01 01:00:00
3 2018-10-01 02:00:00
4 2018-10-01 03:00:00
5 2018-10-01 04:00:00
6 2018-10-01 05:00:00
> 

当我将其强制为 data.frame 时,它会丢失。我不知道如何保存它,我尝试过类似的方法:

attr(mydf$myvector, "tzone") <- attr(myvector, "tzone")tzone 并不是真正的属性,因此它不起作用。

POSIXct中的CEST/CET是什么?我如何在强制 df 时保留它?

谢谢

您需要在 POSIXct 列上应用 as.POSIXlt,然后才能从中获取时区

#Extract timezone from POSIXct column of a dataframe
mydf$timezone <- attr(as.POSIXlt(mydf$myvector), "tzone")[1]

head(mydf)
#             myvector      timezone
#1 2018-10-01 00:00:00 Europe/Berlin
#2 2018-10-01 01:00:00 Europe/Berlin
#3 2018-10-01 02:00:00 Europe/Berlin
#4 2018-10-01 03:00:00 Europe/Berlin
#5 2018-10-01 04:00:00 Europe/Berlin
#6 2018-10-01 05:00:00 Europe/Berlin

示例数据:

myvector <- seq(as.POSIXct("2018-10-01"), as.POSIXct("2018-10-02"), "hour")
head(myvector)
#[1] "2018-10-01 00:00:00 CEST" "2018-10-01 01:00:00 CEST" "2018-10-01 02:00:00 CEST"
#[4] "2018-10-01 03:00:00 CEST" "2018-10-01 04:00:00 CEST" "2018-10-01 05:00:00 CEST"

mydf <- as.data.frame(myvector)
head(mydf$myvector)
#[1] "2018-10-01 00:00:00 CEST" "2018-10-01 01:00:00 CEST" "2018-10-01 02:00:00 CEST"
#[4] "2018-10-01 03:00:00 CEST" "2018-10-01 04:00:00 CEST" "2018-10-01 05:00:00 CEST"    


替代方法:如果你真的关心CETCEST只输出

mydf$timezone <- gsub("^.*\s", "", format(mydf$myvector, usetz = TRUE))

head(mydf)
#             myvector timezone
#1 2018-10-01 00:00:00     CEST
#2 2018-10-01 01:00:00     CEST
#3 2018-10-01 02:00:00     CEST
#4 2018-10-01 03:00:00     CEST
#5 2018-10-01 04:00:00     CEST
#6 2018-10-01 05:00:00     CEST