将向量值与 R 中列表中的元素匹配
Match vector values with elements in list in R
我正在尝试将月份向量与 R 中的适当季度相匹配。不幸的是,我继承的代码包含列表中的季度,适当的月份作为每个列表元素的向量(这至少应该适应性强,这样你就可以根据需要进行季度、三个月或学期)。目前,我正在使用 sapply
循环遍历向量并将适当的季度与每个月匹配,如下所示:
month.vec <- sample(1:12, 100, replace=T)
quarters.list <- list(`1` = 1:3, `2` = 4:6, `3` = 7:9, `4` = 10:12)
month.to.quarter <- function(months, quarters) {
sapply(months, FUN=function(x) {
as.numeric(substr(names(which(x == unlist(quarters))),0,1))
})
}
month.to.quarter(month.vec, quarters.list)
这对于大约 length(month.vec) < 1e5
左右的向量非常有效,但之后会有点耗时(请参见下面的代码)。对于这种在比这更长的向量上的匹配,有没有人有一个优雅的解决方案?
显示处理时间如何随着向量长度增加的脚本。注意:这需要几秒钟 (<10)
times <- NULL
for (i in c(10 %o% 10^(2:5))) {
month.vec <- sample(1:12, i, replace=T)
quarters.list <- list(`1` = 1:3, `2` = 4:6, `3` = 7:9, `4` = 10:12)
t <- system.time(a <- month.to.quarter(month.vec, quarters.list))[3]
time <- data.frame(n = i, time = t)
times <- rbind(times, time)
}
plot(time ~ n, times)
这是我想到的第一个方法。我想我是在哈德利的书中看到的。它使用矢量元素的名称。
month.vec <- sample(1:12, 10000, replace=T)
quarters.list <- list(`1` = 1:3, `2` = 4:6, `3` = 7:9, `4` = 10:12)
# your method
month.to.quarter <- function(months, quarters) {
sapply(months, FUN=function(x) {
as.numeric(substr(names(which(x == unlist(quarters))),0,1))
})
}
out1 <-month.to.quarter(month.vec, quarters.list)
# my method
vec <- rep(1:4, each = 3)
names(vec) <- 1:12
out2 <- vec[month.vec]
names(out2) <- NULL
all.equal(out1, out2) # this will return TRUE
基准真的不一样。
month.vec <- sample(1:12, 10000, replace=T)
microbenchmark::microbenchmark(vec[month.vec],
month.to.quarter(month.vec, quarters.list))
## Unit: microseconds
## expr min lq mean median uq max neval
## vec[month.vec] 108.503 112.433 119.3982 116.916 119.983 183.467 100
## month.to.quarter(month.vec, quarters.list) 78859.160 84036.995 87956.6532 86960.269 89975.668 140797.487 100
新方法大约快 800 倍。
如果你想把它变成一个函数就这样,而且还是相当快的
month.to.quarter2 <- function(months) {
vec <- rep(1:4, each = 3)
names(vec) <- 1:12
out <- vec[months]
names(out) <- NULL
return(out)
}
microbenchmark::microbenchmark(vec[month.vec],
month.to.quarter(month.vec, quarters.list),
month.to.quarter2(month.vec))
## Unit: microseconds
## expr min lq mean median uq max neval
## vec[month.vec] 109.222 111.6345 121.3035 115.604 117.916 706.034 100
## month.to.quarter(month.vec, quarters.list) 77292.742 83032.7425 85770.6963 84690.500 87243.327 138531.309 100
## month.to.quarter2(month.vec) 117.264 120.3555 127.6535 127.021 133.474 153.556 100
我想知道如果将季度列表倒置会不会更快,这样就可以以月份为索引来查找季度。像下面这样...
quarters <- as.numeric(substr(names(sort(unlist(quarters.list))),1,1))
这个只需要做一次,以后就可以了
quarters.vec <- quarters[month.vec]
大约快了2000倍...
microbenchmark::microbenchmark(quarters[month.vec],month.to.quarter(month.vec, quarters.list))
Unit: microseconds
expr min lq mean median uq max neval
quarters[month.vec] 199.836 202.629 235.3968 227.763 233.9695 554.823 100
month.to.quarter(month.vec, quarters.list) 439466.006 456649.059 495957.5722 469543.098 499346.5020 935046.664 100
试试这个:
(month.vec - 1) %/% 3 + 1
我正在尝试将月份向量与 R 中的适当季度相匹配。不幸的是,我继承的代码包含列表中的季度,适当的月份作为每个列表元素的向量(这至少应该适应性强,这样你就可以根据需要进行季度、三个月或学期)。目前,我正在使用 sapply
循环遍历向量并将适当的季度与每个月匹配,如下所示:
month.vec <- sample(1:12, 100, replace=T)
quarters.list <- list(`1` = 1:3, `2` = 4:6, `3` = 7:9, `4` = 10:12)
month.to.quarter <- function(months, quarters) {
sapply(months, FUN=function(x) {
as.numeric(substr(names(which(x == unlist(quarters))),0,1))
})
}
month.to.quarter(month.vec, quarters.list)
这对于大约 length(month.vec) < 1e5
左右的向量非常有效,但之后会有点耗时(请参见下面的代码)。对于这种在比这更长的向量上的匹配,有没有人有一个优雅的解决方案?
显示处理时间如何随着向量长度增加的脚本。注意:这需要几秒钟 (<10)
times <- NULL
for (i in c(10 %o% 10^(2:5))) {
month.vec <- sample(1:12, i, replace=T)
quarters.list <- list(`1` = 1:3, `2` = 4:6, `3` = 7:9, `4` = 10:12)
t <- system.time(a <- month.to.quarter(month.vec, quarters.list))[3]
time <- data.frame(n = i, time = t)
times <- rbind(times, time)
}
plot(time ~ n, times)
这是我想到的第一个方法。我想我是在哈德利的书中看到的。它使用矢量元素的名称。
month.vec <- sample(1:12, 10000, replace=T)
quarters.list <- list(`1` = 1:3, `2` = 4:6, `3` = 7:9, `4` = 10:12)
# your method
month.to.quarter <- function(months, quarters) {
sapply(months, FUN=function(x) {
as.numeric(substr(names(which(x == unlist(quarters))),0,1))
})
}
out1 <-month.to.quarter(month.vec, quarters.list)
# my method
vec <- rep(1:4, each = 3)
names(vec) <- 1:12
out2 <- vec[month.vec]
names(out2) <- NULL
all.equal(out1, out2) # this will return TRUE
基准真的不一样。
month.vec <- sample(1:12, 10000, replace=T)
microbenchmark::microbenchmark(vec[month.vec],
month.to.quarter(month.vec, quarters.list))
## Unit: microseconds
## expr min lq mean median uq max neval
## vec[month.vec] 108.503 112.433 119.3982 116.916 119.983 183.467 100
## month.to.quarter(month.vec, quarters.list) 78859.160 84036.995 87956.6532 86960.269 89975.668 140797.487 100
新方法大约快 800 倍。
如果你想把它变成一个函数就这样,而且还是相当快的
month.to.quarter2 <- function(months) {
vec <- rep(1:4, each = 3)
names(vec) <- 1:12
out <- vec[months]
names(out) <- NULL
return(out)
}
microbenchmark::microbenchmark(vec[month.vec],
month.to.quarter(month.vec, quarters.list),
month.to.quarter2(month.vec))
## Unit: microseconds
## expr min lq mean median uq max neval
## vec[month.vec] 109.222 111.6345 121.3035 115.604 117.916 706.034 100
## month.to.quarter(month.vec, quarters.list) 77292.742 83032.7425 85770.6963 84690.500 87243.327 138531.309 100
## month.to.quarter2(month.vec) 117.264 120.3555 127.6535 127.021 133.474 153.556 100
我想知道如果将季度列表倒置会不会更快,这样就可以以月份为索引来查找季度。像下面这样...
quarters <- as.numeric(substr(names(sort(unlist(quarters.list))),1,1))
这个只需要做一次,以后就可以了
quarters.vec <- quarters[month.vec]
大约快了2000倍...
microbenchmark::microbenchmark(quarters[month.vec],month.to.quarter(month.vec, quarters.list))
Unit: microseconds
expr min lq mean median uq max neval
quarters[month.vec] 199.836 202.629 235.3968 227.763 233.9695 554.823 100
month.to.quarter(month.vec, quarters.list) 439466.006 456649.059 495957.5722 469543.098 499346.5020 935046.664 100
试试这个:
(month.vec - 1) %/% 3 + 1