将 netcdf 时间变量转换为 R 日期对象
convert a netcdf time variable to an R date object
我有一个带有时间序列的 netcdf 文件,时间变量具有以下典型元数据:
double time(time) ;
time:standard_name = "time" ;
time:bounds = "time_bnds" ;
time:units = "days since 1979-1-1 00:00:00" ;
time:calendar = "standard" ;
time:axis = "T" ;
在 R 内部,我想将时间转换为 R 日期对象。我目前通过读取单位属性并拆分字符串并使用第三个条目作为我的起点以硬连线方式实现这一点(因此假设间距为“天”并且时间为 00:00 等):
require("ncdf4")
f1<-nc_open("file.nc")
time<-ncvar_get(f1,"time")
tunits<-ncatt_get(f1,"time",attname="units")
tustr<-strsplit(tunits$value, " ")
dates<-as.Date(time,origin=unlist(tustr)[3])
这个固定解决方案适用于我的特定示例,但我希望 R 中可能有一个包可以很好地处理时间单位的 UNIDATA netcdf 日期约定并将它们安全地转换为 R 日期对象?
据我所知,没有。我有这个使用 lubridate
的方便函数,它与你的基本相同。
getNcTime <- function(nc) {
require(lubridate)
ncdims <- names(nc$dim) #get netcdf dimensions
timevar <- ncdims[which(ncdims %in% c("time", "Time", "datetime", "Datetime", "date", "Date"))[1]] #find time variable
times <- ncvar_get(nc, timevar)
if (length(timevar)==0) stop("ERROR! Could not identify the correct time variable")
timeatt <- ncatt_get(nc, timevar) #get attributes
timedef <- strsplit(timeatt$units, " ")[[1]]
timeunit <- timedef[1]
tz <- timedef[5]
timestart <- strsplit(timedef[4], ":")[[1]]
if (length(timestart) != 3 || timestart[1] > 24 || timestart[2] > 60 || timestart[3] > 60 || any(timestart < 0)) {
cat("Warning:", timestart, "not a valid start time. Assuming 00:00:00\n")
warning(paste("Warning:", timestart, "not a valid start time. Assuming 00:00:00\n"))
timedef[4] <- "00:00:00"
}
if (! tz %in% OlsonNames()) {
cat("Warning:", tz, "not a valid timezone. Assuming UTC\n")
warning(paste("Warning:", timestart, "not a valid start time. Assuming 00:00:00\n"))
tz <- "UTC"
}
timestart <- ymd_hms(paste(timedef[3], timedef[4]), tz=tz)
f <- switch(tolower(timeunit), #Find the correct lubridate time function based on the unit
seconds=seconds, second=seconds, sec=seconds,
minutes=minutes, minute=minutes, min=minutes,
hours=hours, hour=hours, h=hours,
days=days, day=days, d=days,
months=months, month=months, m=months,
years=years, year=years, yr=years,
NA
)
suppressWarnings(if (is.na(f)) stop("Could not understand the time unit format"))
timestart + f(times)
}
编辑:人们可能还想看看 ncdf4.helpers::nc.get.time.series
EDIT2:请注意,新提议的和目前正在开发的 awesome stars
包将自动处理日期,请参阅 the first blog post 示例。
EDIT3:另一种方法是直接使用 units
包,这就是 stars
所使用的。可以这样做:(仍然没有正确处理日历,我不确定 units
可以)
getNcTime <- function(nc) { ##NEW VERSION, with the units package
require(units)
require(ncdf4)
options(warn=1) #show warnings by default
if (is.character(nc)) nc <- nc_open(nc)
ncdims <- names(nc$dim) #get netcdf dimensions
timevar <- ncdims[which(ncdims %in% c("time", "Time", "datetime", "Datetime", "date", "Date"))] #find (first) time variable
if (length(timevar) > 1) {
warning(paste("Found more than one time var. Using the first:", timevar[1]))
timevar <- timevar[1]
}
if (length(timevar)!=1) stop("ERROR! Could not identify the correct time variable")
times <- ncvar_get(nc, timevar) #get time data
timeatt <- ncatt_get(nc, timevar) #get attributes
timeunit <- timeatt$units
units(times) <- make_unit(timeunit)
as.POSIXct(time)
}
我无法使用@AF7 的功能来处理我的文件,所以我自己写了一个。下面的函数创建一个 POSIXct 日期向量,从 nc 文件中读取开始日期、时间间隔、单位和长度。它适用于许多(但可能不是每个......)形状或形式的 nc 文件。
ncdate <- function(nc) {
ncdims <- names(nc$dim) #Extract dimension names
timevar <- ncdims[which(ncdims %in% c("time", "Time", "datetime", "Datetime",
"date", "Date"))[1]] # Pick the time dimension
ntstep <-nc$dim[[timevar]]$len
tm <- ncvar_get(nc, timevar) # Extract the timestep count
tunits <- ncatt_get(nc, timevar, "units") # Extract the long name of units
tspace <- tm[2] - tm[1] # Calculate time period between two timesteps, for the "by" argument
tstr <- strsplit(tunits$value, " ") # Extract string components of the time unit
a<-unlist(tstr[1]) # Isolate the unit .i.e. seconds, hours, days etc.
uname <- a[which(a %in% c("seconds","hours","days"))[1]] # Check unit
startd <- as.POSIXct(gsub(paste(uname,'since '),'',tunits$value),format="%Y-%m-%d %H:%M:%S") ## Extract the start / origin date
tmulti <- 3600 # Declare hourly multiplier for date
if (uname == "days") tmulti =86400 # Declare daily multiplier for date
## Rename "seconds" to "secs" for "by" argument and change the multiplier.
if (uname == "seconds") {
uname <- "secs"
tmulti <- 1 }
byt <- paste(tspace,uname) # Define the "by" argument
if (byt == "0.0416666679084301 days") { ## If the unit is "days" but the "by" interval is in hours
byt= "1 hour" ## R won't understand "by < 1" so change by and unit to hour.
uname = "hours"}
datev <- seq(from=as.POSIXct(startd+tm[1]*tmulti),by= byt, units=uname,length=ntstep)
}
编辑
为了解决@AF7 的评论强调的缺陷,即上述代码仅适用于规则间隔的文件,datev
可以计算为
datev <- as.POSIXct(tm*tmulti,origin=startd)
我刚刚发现(发布问题两年后!)有一个名为 ncdf.tools 的包具有以下功能:
convertDateNcdf2R
哪个
converts a time vector from a netCDF file or a vector of Julian days
(or seconds, minutes, hours) since a specified origin into a POSIXct R
vector.
用法:
convertDateNcdf2R(time.source, units = "days", origin = as.POSIXct("1800-01-01",
tz = "UTC"), time.format = c("%Y-%m-%d", "%Y-%m-%d %H:%M:%S",
"%Y-%m-%d %H:%M", "%Y-%m-%d %Z %H:%M", "%Y-%m-%d %Z %H:%M:%S"))
参数:
time.source
numeric vector or netCDF connection: 自原点以来的多个时间单位或 netCDF 文件连接,在后一种情况下,时间向量是从 netCDF 文件中提取的,该文件,尤其是时间变量,具有遵循 CF netCDF 约定。
units
字符串:时间源的单位。 如果源是 netCDF 文件,该值将被忽略并从该文件中读取。
origin
POSIXct 对象:时间源的原点或 day/hour 零。 如果源是 netCDF 文件,该值将被忽略并从该文件中读取。
因此,只需将 netcdf 连接作为第一个参数传递就足够了,其余的由函数处理。警告:这仅在 netCDF 文件遵循 CF 约定时才有效(例如,如果您的单位是“年以来”而不是“秒后”或“天后”,例如它将失败)。
此处提供更多详细信息:
https://rdrr.io/cran/ncdf.tools/man/convertDateNcdf2R.html
我有一个带有时间序列的 netcdf 文件,时间变量具有以下典型元数据:
double time(time) ;
time:standard_name = "time" ;
time:bounds = "time_bnds" ;
time:units = "days since 1979-1-1 00:00:00" ;
time:calendar = "standard" ;
time:axis = "T" ;
在 R 内部,我想将时间转换为 R 日期对象。我目前通过读取单位属性并拆分字符串并使用第三个条目作为我的起点以硬连线方式实现这一点(因此假设间距为“天”并且时间为 00:00 等):
require("ncdf4")
f1<-nc_open("file.nc")
time<-ncvar_get(f1,"time")
tunits<-ncatt_get(f1,"time",attname="units")
tustr<-strsplit(tunits$value, " ")
dates<-as.Date(time,origin=unlist(tustr)[3])
这个固定解决方案适用于我的特定示例,但我希望 R 中可能有一个包可以很好地处理时间单位的 UNIDATA netcdf 日期约定并将它们安全地转换为 R 日期对象?
据我所知,没有。我有这个使用 lubridate
的方便函数,它与你的基本相同。
getNcTime <- function(nc) {
require(lubridate)
ncdims <- names(nc$dim) #get netcdf dimensions
timevar <- ncdims[which(ncdims %in% c("time", "Time", "datetime", "Datetime", "date", "Date"))[1]] #find time variable
times <- ncvar_get(nc, timevar)
if (length(timevar)==0) stop("ERROR! Could not identify the correct time variable")
timeatt <- ncatt_get(nc, timevar) #get attributes
timedef <- strsplit(timeatt$units, " ")[[1]]
timeunit <- timedef[1]
tz <- timedef[5]
timestart <- strsplit(timedef[4], ":")[[1]]
if (length(timestart) != 3 || timestart[1] > 24 || timestart[2] > 60 || timestart[3] > 60 || any(timestart < 0)) {
cat("Warning:", timestart, "not a valid start time. Assuming 00:00:00\n")
warning(paste("Warning:", timestart, "not a valid start time. Assuming 00:00:00\n"))
timedef[4] <- "00:00:00"
}
if (! tz %in% OlsonNames()) {
cat("Warning:", tz, "not a valid timezone. Assuming UTC\n")
warning(paste("Warning:", timestart, "not a valid start time. Assuming 00:00:00\n"))
tz <- "UTC"
}
timestart <- ymd_hms(paste(timedef[3], timedef[4]), tz=tz)
f <- switch(tolower(timeunit), #Find the correct lubridate time function based on the unit
seconds=seconds, second=seconds, sec=seconds,
minutes=minutes, minute=minutes, min=minutes,
hours=hours, hour=hours, h=hours,
days=days, day=days, d=days,
months=months, month=months, m=months,
years=years, year=years, yr=years,
NA
)
suppressWarnings(if (is.na(f)) stop("Could not understand the time unit format"))
timestart + f(times)
}
编辑:人们可能还想看看 ncdf4.helpers::nc.get.time.series
EDIT2:请注意,新提议的和目前正在开发的 awesome stars
包将自动处理日期,请参阅 the first blog post 示例。
EDIT3:另一种方法是直接使用 units
包,这就是 stars
所使用的。可以这样做:(仍然没有正确处理日历,我不确定 units
可以)
getNcTime <- function(nc) { ##NEW VERSION, with the units package
require(units)
require(ncdf4)
options(warn=1) #show warnings by default
if (is.character(nc)) nc <- nc_open(nc)
ncdims <- names(nc$dim) #get netcdf dimensions
timevar <- ncdims[which(ncdims %in% c("time", "Time", "datetime", "Datetime", "date", "Date"))] #find (first) time variable
if (length(timevar) > 1) {
warning(paste("Found more than one time var. Using the first:", timevar[1]))
timevar <- timevar[1]
}
if (length(timevar)!=1) stop("ERROR! Could not identify the correct time variable")
times <- ncvar_get(nc, timevar) #get time data
timeatt <- ncatt_get(nc, timevar) #get attributes
timeunit <- timeatt$units
units(times) <- make_unit(timeunit)
as.POSIXct(time)
}
我无法使用@AF7 的功能来处理我的文件,所以我自己写了一个。下面的函数创建一个 POSIXct 日期向量,从 nc 文件中读取开始日期、时间间隔、单位和长度。它适用于许多(但可能不是每个......)形状或形式的 nc 文件。
ncdate <- function(nc) {
ncdims <- names(nc$dim) #Extract dimension names
timevar <- ncdims[which(ncdims %in% c("time", "Time", "datetime", "Datetime",
"date", "Date"))[1]] # Pick the time dimension
ntstep <-nc$dim[[timevar]]$len
tm <- ncvar_get(nc, timevar) # Extract the timestep count
tunits <- ncatt_get(nc, timevar, "units") # Extract the long name of units
tspace <- tm[2] - tm[1] # Calculate time period between two timesteps, for the "by" argument
tstr <- strsplit(tunits$value, " ") # Extract string components of the time unit
a<-unlist(tstr[1]) # Isolate the unit .i.e. seconds, hours, days etc.
uname <- a[which(a %in% c("seconds","hours","days"))[1]] # Check unit
startd <- as.POSIXct(gsub(paste(uname,'since '),'',tunits$value),format="%Y-%m-%d %H:%M:%S") ## Extract the start / origin date
tmulti <- 3600 # Declare hourly multiplier for date
if (uname == "days") tmulti =86400 # Declare daily multiplier for date
## Rename "seconds" to "secs" for "by" argument and change the multiplier.
if (uname == "seconds") {
uname <- "secs"
tmulti <- 1 }
byt <- paste(tspace,uname) # Define the "by" argument
if (byt == "0.0416666679084301 days") { ## If the unit is "days" but the "by" interval is in hours
byt= "1 hour" ## R won't understand "by < 1" so change by and unit to hour.
uname = "hours"}
datev <- seq(from=as.POSIXct(startd+tm[1]*tmulti),by= byt, units=uname,length=ntstep)
}
编辑
为了解决@AF7 的评论强调的缺陷,即上述代码仅适用于规则间隔的文件,datev
可以计算为
datev <- as.POSIXct(tm*tmulti,origin=startd)
我刚刚发现(发布问题两年后!)有一个名为 ncdf.tools 的包具有以下功能:
convertDateNcdf2R
哪个
converts a time vector from a netCDF file or a vector of Julian days (or seconds, minutes, hours) since a specified origin into a POSIXct R vector.
用法:
convertDateNcdf2R(time.source, units = "days", origin = as.POSIXct("1800-01-01",
tz = "UTC"), time.format = c("%Y-%m-%d", "%Y-%m-%d %H:%M:%S",
"%Y-%m-%d %H:%M", "%Y-%m-%d %Z %H:%M", "%Y-%m-%d %Z %H:%M:%S"))
参数:
time.source
numeric vector or netCDF connection: 自原点以来的多个时间单位或 netCDF 文件连接,在后一种情况下,时间向量是从 netCDF 文件中提取的,该文件,尤其是时间变量,具有遵循 CF netCDF 约定。
units
字符串:时间源的单位。 如果源是 netCDF 文件,该值将被忽略并从该文件中读取。
origin
POSIXct 对象:时间源的原点或 day/hour 零。 如果源是 netCDF 文件,该值将被忽略并从该文件中读取。
因此,只需将 netcdf 连接作为第一个参数传递就足够了,其余的由函数处理。警告:这仅在 netCDF 文件遵循 CF 约定时才有效(例如,如果您的单位是“年以来”而不是“秒后”或“天后”,例如它将失败)。
此处提供更多详细信息: https://rdrr.io/cran/ncdf.tools/man/convertDateNcdf2R.html