将具有多个时区的大型数据集的本地日期时间转换为 UTC
Converting local date-times from a large dataset with multiple time zones into UTC
我得到了一个包含纬度、经度、本地日期和本地时间字段的海量数据集。我试图将此信息组织到 ISO UTC 时间字段中。我的代码有问题,因为当数据中有多个时区时,我不知道如何使用 as.POSIXct() 。每当我尝试在 tz = 部分使用数组或循环时,我都会收到错误消息。
我使用了 lutz 包中的 tz_lookup_coords() 函数来定义数据框中每一行的时区。我还能够将我的数据过滤到单个时区,并成功使用 as.POSIXct() 和 format() 函数来获取 UTC 时间。但是,我想要一个更优雅的解决方案,将单个代码应用于整个数据集。
sample<- data.frame(
"localDate" = c("2015-04-20","2016-07-17","2015-08-06"),
"localTime" = c("14:00", "14:46", NA ),
"timeZone" = c("Pacific/Pago_Pago", NA , "Pacific/Honolulu")
)
# Change times from local to UTC
sample$localDateTime <- paste(sample$localDate, sample$localTime, sep = " ")
for (i in 1:nrow(sample)) {
sample[i,]$localDateTime <- as.POSIXct(sample[i,]$localDateTime, tz= sample[i,]$timeZone, "%Y-%m-%d %H:%M")
}
sample$eventDate <- format(sample$localDateTime, tz= "UTC", usetz = TRUE)
当我输入像 "Pacific/Honolulu" 这样的单个时区时,代码运行良好,但它会将每一行视为在一个时区中。
> sample
localDate localTime timeZone localDateTime eventDate
1 2015-04-20 14:00 Pacific/Pago_Pago 2015-04-20 14:00 2015-04-21 00:00:00 UTC
2 2016-07-17 14:46 Pacific/Saipan 2016-07-17 14:46 2016-07-18 00:46:00 UTC
3 2015-08-06 10:35 Pacific/Honolulu 2015-08-06 10:35 2015-08-06 20:35:00 UTC
如果我尝试在函数的 tz = 部分使用引号中的字符串以外的任何内容,我会收到此代码:
Error in strptime(x, format, tz = tz) : invalid 'tz' value
library(lubridate)
sample <-
data.frame(
"localDate" = c("2015-04-20","2016-07-17","2015-08-06"),
"localTime" = c("14:00", "14:46", "10:35"),
"timeZone" = c("Pacific/Pago_Pago", "Pacific/Saipan", "Pacific/Honolulu")
)
sample$localDateTime <-
paste(sample$localDate, sample$localTime, sep = " ")
list <-
list()
for (i in 1:nrow(sample)){
list[[i]] <-
ymd_hm(sample$localDateTime[i],
tz = as.character(sample$timeZone[i]))
}
list
R> list
[[1]]
[1] "2015-04-20 14:00:00 SST"
[[2]]
[1] "2016-07-17 14:46:00 ChST"
[[3]]
[1] "2015-08-06 10:35:00 HST"
错误是由于 timeZone
列是一个因子而不是字符向量,定义 data.frame 时使用 stringsAsFactors = FALSE
指定 timeZone
为字符柱子。您还可以通过使用 lubridate
包中的向量化函数来避免任何循环:
library(lubridate)
df <- data.frame(
"localDate" = c("2015-04-20","2016-07-17","2015-08-06"),
"localTime" = c("14:00", "14:46", "10:35"),
"timeZone" = c("Pacific/Pago_Pago", "Pacific/Saipan", "Pacific/Honolulu"),
stringsAsFactors = FALSE
)
df$eventDate <- force_tzs(ymd_hm(with(df, paste(localDate, localTime))), tzones = df$timeZone)
df
#> localDate localTime timeZone eventDate
#> 1 2015-04-20 14:00 Pacific/Pago_Pago 2015-04-21 01:00:00
#> 2 2016-07-17 14:46 Pacific/Saipan 2016-07-17 04:46:00
#> 3 2015-08-06 10:35 Pacific/Honolulu 2015-08-06 20:35:00
编辑:在缺少值的情况下,检查每一行是否可以转换,如果不能return NA
。下面是一个使用 base R 的示例解决方案:
df <- data.frame(
"localDate" = c("2015-04-20","2016-07-17","2015-08-06", "2019-01-01", "2019-01-01"),
"localTime" = c("14:00", "14:46", "10:35", NA, "00:00"),
"timeZone" = c("Pacific/Pago_Pago", "Pacific/Saipan", "Pacific/Honolulu",
"Pacific/Honolulu", NA),
stringsAsFactors = FALSE
)
df$eventDate <- apply(df, 1, function(row) {
ifelse(any(is.na(row)), NA_character_,
format(as.POSIXct(paste(row["localDate"], row["localTime"]), "%Y-%m-%d %H:%M",
tz = row["timeZone"]), tz = "UTC", usetz = TRUE)
)
})
df
#> localDate localTime timeZone eventDate
#> 1 2015-04-20 14:00 Pacific/Pago_Pago 2015-04-21 01:00:00 UTC
#> 2 2016-07-17 14:46 Pacific/Saipan 2016-07-17 04:46:00 UTC
#> 3 2015-08-06 10:35 Pacific/Honolulu 2015-08-06 20:35:00 UTC
#> 4 2019-01-01 <NA> Pacific/Honolulu <NA>
#> 5 2019-01-01 00:00 <NA> <NA>
我得到了一个包含纬度、经度、本地日期和本地时间字段的海量数据集。我试图将此信息组织到 ISO UTC 时间字段中。我的代码有问题,因为当数据中有多个时区时,我不知道如何使用 as.POSIXct() 。每当我尝试在 tz = 部分使用数组或循环时,我都会收到错误消息。
我使用了 lutz 包中的 tz_lookup_coords() 函数来定义数据框中每一行的时区。我还能够将我的数据过滤到单个时区,并成功使用 as.POSIXct() 和 format() 函数来获取 UTC 时间。但是,我想要一个更优雅的解决方案,将单个代码应用于整个数据集。
sample<- data.frame(
"localDate" = c("2015-04-20","2016-07-17","2015-08-06"),
"localTime" = c("14:00", "14:46", NA ),
"timeZone" = c("Pacific/Pago_Pago", NA , "Pacific/Honolulu")
)
# Change times from local to UTC
sample$localDateTime <- paste(sample$localDate, sample$localTime, sep = " ")
for (i in 1:nrow(sample)) {
sample[i,]$localDateTime <- as.POSIXct(sample[i,]$localDateTime, tz= sample[i,]$timeZone, "%Y-%m-%d %H:%M")
}
sample$eventDate <- format(sample$localDateTime, tz= "UTC", usetz = TRUE)
当我输入像 "Pacific/Honolulu" 这样的单个时区时,代码运行良好,但它会将每一行视为在一个时区中。
> sample
localDate localTime timeZone localDateTime eventDate
1 2015-04-20 14:00 Pacific/Pago_Pago 2015-04-20 14:00 2015-04-21 00:00:00 UTC
2 2016-07-17 14:46 Pacific/Saipan 2016-07-17 14:46 2016-07-18 00:46:00 UTC
3 2015-08-06 10:35 Pacific/Honolulu 2015-08-06 10:35 2015-08-06 20:35:00 UTC
如果我尝试在函数的 tz = 部分使用引号中的字符串以外的任何内容,我会收到此代码:
Error in strptime(x, format, tz = tz) : invalid 'tz' value
library(lubridate)
sample <-
data.frame(
"localDate" = c("2015-04-20","2016-07-17","2015-08-06"),
"localTime" = c("14:00", "14:46", "10:35"),
"timeZone" = c("Pacific/Pago_Pago", "Pacific/Saipan", "Pacific/Honolulu")
)
sample$localDateTime <-
paste(sample$localDate, sample$localTime, sep = " ")
list <-
list()
for (i in 1:nrow(sample)){
list[[i]] <-
ymd_hm(sample$localDateTime[i],
tz = as.character(sample$timeZone[i]))
}
list
R> list
[[1]]
[1] "2015-04-20 14:00:00 SST"
[[2]]
[1] "2016-07-17 14:46:00 ChST"
[[3]]
[1] "2015-08-06 10:35:00 HST"
错误是由于 timeZone
列是一个因子而不是字符向量,定义 data.frame 时使用 stringsAsFactors = FALSE
指定 timeZone
为字符柱子。您还可以通过使用 lubridate
包中的向量化函数来避免任何循环:
library(lubridate)
df <- data.frame(
"localDate" = c("2015-04-20","2016-07-17","2015-08-06"),
"localTime" = c("14:00", "14:46", "10:35"),
"timeZone" = c("Pacific/Pago_Pago", "Pacific/Saipan", "Pacific/Honolulu"),
stringsAsFactors = FALSE
)
df$eventDate <- force_tzs(ymd_hm(with(df, paste(localDate, localTime))), tzones = df$timeZone)
df
#> localDate localTime timeZone eventDate
#> 1 2015-04-20 14:00 Pacific/Pago_Pago 2015-04-21 01:00:00
#> 2 2016-07-17 14:46 Pacific/Saipan 2016-07-17 04:46:00
#> 3 2015-08-06 10:35 Pacific/Honolulu 2015-08-06 20:35:00
编辑:在缺少值的情况下,检查每一行是否可以转换,如果不能return NA
。下面是一个使用 base R 的示例解决方案:
df <- data.frame(
"localDate" = c("2015-04-20","2016-07-17","2015-08-06", "2019-01-01", "2019-01-01"),
"localTime" = c("14:00", "14:46", "10:35", NA, "00:00"),
"timeZone" = c("Pacific/Pago_Pago", "Pacific/Saipan", "Pacific/Honolulu",
"Pacific/Honolulu", NA),
stringsAsFactors = FALSE
)
df$eventDate <- apply(df, 1, function(row) {
ifelse(any(is.na(row)), NA_character_,
format(as.POSIXct(paste(row["localDate"], row["localTime"]), "%Y-%m-%d %H:%M",
tz = row["timeZone"]), tz = "UTC", usetz = TRUE)
)
})
df
#> localDate localTime timeZone eventDate
#> 1 2015-04-20 14:00 Pacific/Pago_Pago 2015-04-21 01:00:00 UTC
#> 2 2016-07-17 14:46 Pacific/Saipan 2016-07-17 04:46:00 UTC
#> 3 2015-08-06 10:35 Pacific/Honolulu 2015-08-06 20:35:00 UTC
#> 4 2019-01-01 <NA> Pacific/Honolulu <NA>
#> 5 2019-01-01 00:00 <NA> <NA>