使用 R 中的 lubridate 从 Date 确定季节
Determine season from Date using lubridate in R
我有一个非常大的数据集,其中 DateTime
列包含 POSIXct-Values。我需要根据 DateTime
列确定季节(冬季 - 夏季)。我创建了一个函数,它在小型数据集上运行良好,但在大型数据集上使用时会崩溃。有人能看出我的错误吗?
我创建了 4 个函数:
- 3个子函数,方便我进行逻辑比较和选择
使用 *apply 函数
- 1个判断季节的函数
函数如下:
require(lubridate)
# function for logical comparison (to be used in *apply)
greaterOrEqual <- function(x,y){
ifelse(x >= y,T,F)
}
# function for logical comparison (to be used in *apply)
less <- function(x,y){
ifelse(x < y,T,F)
}
# function for logical comparison (to be used in *apply)
selFromLogic <- function(VecLogic,VecValue){
VecValue[VecLogic]
}
# Main Function to determine the season
getTwoSeasons <- function(input.date) {
Winter1Start <- as.POSIXct("2000-01-01 00:00:00", tz = "UTC")
Winter1End <- as.POSIXct("2000-04-15 23:59:59", tz = "UTC")
SummerStart <- Winter1End + 1
SummerEnd <- as.POSIXct("2000-10-15 23:59:59", tz = "UTC")
Winter2Start <- SummerEnd + 1
Winter2End <- as.POSIXct("2000-12-31 00:00:00", tz = "UTC")
year(input.date) <- year(Winter1Start)
attr(input.date, "tzone") <- attr(Winter1Start, "tzone")
SeasonStart <- c(Winter1Start,SummerStart,Winter2Start)
SeasonsEnd <- c(Winter1End,SummerEnd,Winter2End)
Season_names <- as.factor(c("WinterHalfYear","SummerHalfYear","WinterHalfYear"))
Season_select <- sapply(SeasonStart, greaterOrEqual, x = input.date) & sapply(SeasonsEnd, less, x = input.date)
Season_return <- apply(Season_select,MARGIN = 1,selFromLogic,VecValue = Season_names)
return(Season_return)
}
下面是测试函数的方法:
dates <- Sys.time() + seq(0,10000,10)
getTwoSeasons(dates)
如果有任何帮助,我将不胜感激,这让我发疯!
经过几个小时的调试,我发现了我的错误,这真的很荒谬:
如果未找到 DateTimeValue 的季节,apply
返回 list
-object 而不是 vector
(当 DateTime 值等于 2000-12-31 00:00:00
).返回一个列表会导致计算时间的过度增加和所描述的崩溃。这是固定代码:
# input date and return 2 season
getTwoSeasons <- function(input.date) {
Winter1Start <- as.POSIXct("2000-01-01 00:00:00", tz = "UTC")
Winter1End <- as.POSIXct("2000-04-15 23:59:59", tz = "UTC")
SummerStart <- Winter1End + 1
SummerEnd <- as.POSIXct("2000-10-15 23:59:59", tz = "UTC")
Winter2Start <- SummerEnd + 1
Winter2End <- as.POSIXct("2001-01-01 00:00:01", tz = "UTC")
SeasonStart <- c(Winter1Start,SummerStart,Winter2Start)
SeasonsEnd <- c(Winter1End,SummerEnd,Winter2End)
Season_names <- factor(c("WinterHalf","SummerHalf","WinterHalf"))
year(input.date) <- year(Winter1Start)
attr(input.date, "tzone") <- attr(Winter1Start, "tzone")
Season_selectStart <- vapply(X = SeasonStart,function(x,y){x <= input.date},FUN.VALUE = logical(length(input.date)),y = input.date)
Season_selectEnd <- vapply(X = SeasonsEnd,function(x,y){x > input.date},FUN.VALUE = logical(length(input.date)),y = input.date)
Season_selectBoth <- Season_selectStart & Season_selectEnd
Season_return <- apply(Season_selectBoth,MARGIN = 1,function(x,y){y[x]}, y = Season_names)
return(Season_return)
}
"sub" 函数现已集成到主函数中,两个 sapply
函数已替换为 vapply
。
PS:时区仍然存在问题,因为 c() 去除了时区。我会在修复它时更新代码。
也可以使用以下策略:基本观察是
substr
可以提取出我们需要的月日信息,以便
决定现在是夏天还是冬天。这个想法是然后将其转换为
month.date 形式的数字,然后是夏天的测试
归结为大于 4.15 但小于 10.16 的数字。
下面的示例显示了当日期向量时如何完成此操作
首先被转化为描述的替代呈现
上面,然后是一个向量,表示现在是夏天 "TRUE" 还是冬天
"FALSE" 将基于此创建。
DateTime <- as.POSIXct(x = "2000-01-01 00:00:00",
tz = "UTC") +
(0:1000)*(60*60*24)
DateTime_2 <- as.numeric(paste(
substr(x = DateTime,
start = 6,
stop = 7),
substr(x = DateTime,
start = 9,
stop = 10),
sep = "."))
.season <- (DateTime_2 > 4.15) & (DateTime_2 < 10.16)
我将@Lars Arne Jordanger 更优雅的方法打包成一个函数:
getTwoSeasons <- function(input.date){
numeric.date <- 100*month(input.date)+day(input.date)
## input Seasons upper limits in the form MMDD in the "break =" option:
cuts <- base::cut(numeric.date, breaks = c(0,415,1015,1231))
# rename the resulting groups (could've been done within cut(...levels=) if "Winter" wasn't double
levels(cuts) <- c("Winter", "Summer","Winter")
return(cuts)
}
在一些示例数据上测试它似乎工作正常:
getTwoSeasons(as.POSIXct("2016-01-01 12:00:00")+(0:365)*(60*60*24))
如果您有兴趣回归 四个 季,请使用以下代码:
library(lubridate)
getSeason <- function(input.date){
numeric.date <- 100*month(input.date)+day(input.date)
## input Seasons upper limits in the form MMDD in the "break =" option:
cuts <- base::cut(numeric.date, breaks = c(0,319,0620,0921,1220,1231))
# rename the resulting groups (could've been done within cut(...levels=) if "Winter" wasn't double
levels(cuts) <- c("Winter","Spring","Summer","Fall","Winter")
return(cuts)
}
单元测试:
getSeason(as.POSIXct("2016-01-01 12:00:00")+(0:365)*(60*60*24))
为了完整起见,值得注意的是 lubridate
现在有一个季度(和一个学期)的功能。 quarter
将年份分成四等分,semester
分为两半:
library(lubridate)
quarter(x, with_year = FALSE, fiscal_start = 1)
semester(x, with_year = FALSE)
更多信息,请参阅:https://www.rdocumentation.org/packages/lubridate/versions/1.7.4/topics/quarter
使用 POSXlt 而不是 POSXct。
我根据我使用的季节定义制作了自己的函数。我为非闰年创建了名为 normal 的矢量,为闰年创建了名为 leap 的矢量,每个季节名称都重复了编号。它从 1 月 1 日开始出现的次数。并创建了以下函数。
SEASON <- function(datee){
datee <- as.POSIXlt(datee)
season <- vector()
normal <- rep(c("Winter","Spring","Summer","Monsoon","Autumn","Winter"), c(46,44,91,77,76,31))
leap <- rep(c("Winter","Spring","Summer","Monsoon","Autumn","Winter"), c(46,45,91,77,76,31))
if(leap_year(year(datee)) == FALSE){
season <- normal[datee$yday+1]
} else {
season <- leap[datee$yday+1]
}
return(season)
}
让我们用它来测试一些数据集。
Dates <- seq(as.POSIXct("2000-01-01"), as.POSIXct("2010-01-01"), by= "day")
sapply(Dates, SEASON)
有效。
我有一个非常大的数据集,其中 DateTime
列包含 POSIXct-Values。我需要根据 DateTime
列确定季节(冬季 - 夏季)。我创建了一个函数,它在小型数据集上运行良好,但在大型数据集上使用时会崩溃。有人能看出我的错误吗?
我创建了 4 个函数:
- 3个子函数,方便我进行逻辑比较和选择 使用 *apply 函数
- 1个判断季节的函数
函数如下:
require(lubridate)
# function for logical comparison (to be used in *apply)
greaterOrEqual <- function(x,y){
ifelse(x >= y,T,F)
}
# function for logical comparison (to be used in *apply)
less <- function(x,y){
ifelse(x < y,T,F)
}
# function for logical comparison (to be used in *apply)
selFromLogic <- function(VecLogic,VecValue){
VecValue[VecLogic]
}
# Main Function to determine the season
getTwoSeasons <- function(input.date) {
Winter1Start <- as.POSIXct("2000-01-01 00:00:00", tz = "UTC")
Winter1End <- as.POSIXct("2000-04-15 23:59:59", tz = "UTC")
SummerStart <- Winter1End + 1
SummerEnd <- as.POSIXct("2000-10-15 23:59:59", tz = "UTC")
Winter2Start <- SummerEnd + 1
Winter2End <- as.POSIXct("2000-12-31 00:00:00", tz = "UTC")
year(input.date) <- year(Winter1Start)
attr(input.date, "tzone") <- attr(Winter1Start, "tzone")
SeasonStart <- c(Winter1Start,SummerStart,Winter2Start)
SeasonsEnd <- c(Winter1End,SummerEnd,Winter2End)
Season_names <- as.factor(c("WinterHalfYear","SummerHalfYear","WinterHalfYear"))
Season_select <- sapply(SeasonStart, greaterOrEqual, x = input.date) & sapply(SeasonsEnd, less, x = input.date)
Season_return <- apply(Season_select,MARGIN = 1,selFromLogic,VecValue = Season_names)
return(Season_return)
}
下面是测试函数的方法:
dates <- Sys.time() + seq(0,10000,10)
getTwoSeasons(dates)
如果有任何帮助,我将不胜感激,这让我发疯!
经过几个小时的调试,我发现了我的错误,这真的很荒谬:
如果未找到 DateTimeValue 的季节,apply
返回 list
-object 而不是 vector
(当 DateTime 值等于 2000-12-31 00:00:00
).返回一个列表会导致计算时间的过度增加和所描述的崩溃。这是固定代码:
# input date and return 2 season
getTwoSeasons <- function(input.date) {
Winter1Start <- as.POSIXct("2000-01-01 00:00:00", tz = "UTC")
Winter1End <- as.POSIXct("2000-04-15 23:59:59", tz = "UTC")
SummerStart <- Winter1End + 1
SummerEnd <- as.POSIXct("2000-10-15 23:59:59", tz = "UTC")
Winter2Start <- SummerEnd + 1
Winter2End <- as.POSIXct("2001-01-01 00:00:01", tz = "UTC")
SeasonStart <- c(Winter1Start,SummerStart,Winter2Start)
SeasonsEnd <- c(Winter1End,SummerEnd,Winter2End)
Season_names <- factor(c("WinterHalf","SummerHalf","WinterHalf"))
year(input.date) <- year(Winter1Start)
attr(input.date, "tzone") <- attr(Winter1Start, "tzone")
Season_selectStart <- vapply(X = SeasonStart,function(x,y){x <= input.date},FUN.VALUE = logical(length(input.date)),y = input.date)
Season_selectEnd <- vapply(X = SeasonsEnd,function(x,y){x > input.date},FUN.VALUE = logical(length(input.date)),y = input.date)
Season_selectBoth <- Season_selectStart & Season_selectEnd
Season_return <- apply(Season_selectBoth,MARGIN = 1,function(x,y){y[x]}, y = Season_names)
return(Season_return)
}
"sub" 函数现已集成到主函数中,两个 sapply
函数已替换为 vapply
。
PS:时区仍然存在问题,因为 c() 去除了时区。我会在修复它时更新代码。
也可以使用以下策略:基本观察是
substr
可以提取出我们需要的月日信息,以便
决定现在是夏天还是冬天。这个想法是然后将其转换为
month.date 形式的数字,然后是夏天的测试
归结为大于 4.15 但小于 10.16 的数字。
下面的示例显示了当日期向量时如何完成此操作 首先被转化为描述的替代呈现 上面,然后是一个向量,表示现在是夏天 "TRUE" 还是冬天 "FALSE" 将基于此创建。
DateTime <- as.POSIXct(x = "2000-01-01 00:00:00",
tz = "UTC") +
(0:1000)*(60*60*24)
DateTime_2 <- as.numeric(paste(
substr(x = DateTime,
start = 6,
stop = 7),
substr(x = DateTime,
start = 9,
stop = 10),
sep = "."))
.season <- (DateTime_2 > 4.15) & (DateTime_2 < 10.16)
我将@Lars Arne Jordanger 更优雅的方法打包成一个函数:
getTwoSeasons <- function(input.date){
numeric.date <- 100*month(input.date)+day(input.date)
## input Seasons upper limits in the form MMDD in the "break =" option:
cuts <- base::cut(numeric.date, breaks = c(0,415,1015,1231))
# rename the resulting groups (could've been done within cut(...levels=) if "Winter" wasn't double
levels(cuts) <- c("Winter", "Summer","Winter")
return(cuts)
}
在一些示例数据上测试它似乎工作正常:
getTwoSeasons(as.POSIXct("2016-01-01 12:00:00")+(0:365)*(60*60*24))
如果您有兴趣回归 四个 季,请使用以下代码:
library(lubridate)
getSeason <- function(input.date){
numeric.date <- 100*month(input.date)+day(input.date)
## input Seasons upper limits in the form MMDD in the "break =" option:
cuts <- base::cut(numeric.date, breaks = c(0,319,0620,0921,1220,1231))
# rename the resulting groups (could've been done within cut(...levels=) if "Winter" wasn't double
levels(cuts) <- c("Winter","Spring","Summer","Fall","Winter")
return(cuts)
}
单元测试:
getSeason(as.POSIXct("2016-01-01 12:00:00")+(0:365)*(60*60*24))
为了完整起见,值得注意的是 lubridate
现在有一个季度(和一个学期)的功能。 quarter
将年份分成四等分,semester
分为两半:
library(lubridate)
quarter(x, with_year = FALSE, fiscal_start = 1)
semester(x, with_year = FALSE)
更多信息,请参阅:https://www.rdocumentation.org/packages/lubridate/versions/1.7.4/topics/quarter
使用 POSXlt 而不是 POSXct。
我根据我使用的季节定义制作了自己的函数。我为非闰年创建了名为 normal 的矢量,为闰年创建了名为 leap 的矢量,每个季节名称都重复了编号。它从 1 月 1 日开始出现的次数。并创建了以下函数。
SEASON <- function(datee){
datee <- as.POSIXlt(datee)
season <- vector()
normal <- rep(c("Winter","Spring","Summer","Monsoon","Autumn","Winter"), c(46,44,91,77,76,31))
leap <- rep(c("Winter","Spring","Summer","Monsoon","Autumn","Winter"), c(46,45,91,77,76,31))
if(leap_year(year(datee)) == FALSE){
season <- normal[datee$yday+1]
} else {
season <- leap[datee$yday+1]
}
return(season)
}
让我们用它来测试一些数据集。
Dates <- seq(as.POSIXct("2000-01-01"), as.POSIXct("2010-01-01"), by= "day")
sapply(Dates, SEASON)
有效。