向量对之间的最小绝对差(贪心算法)
Minimum absolute difference between vector pairs (greedy algorithm)
给定一个数值向量,我想找到大小为 2 的组合的最小绝对差。但是,摩擦点来自于使用 combn
创建包含对的矩阵。当 matrix/vector 太大时,如何处理问题?
当使用 combn
的结果对数(列数)太大时,出现以下错误:
Error in matrix(r, nrow = len.r, ncol = count) :
invalid 'ncol' value (too large or NA)
This post states that the size limit of a matrix is roughly one billion rows and two columns.
这是我用过的代码。对于在我的函数输出中使用 cat
表示歉意——我正在解决 HackerRank 中的 Minimum Absolute Difference in an Array 贪心算法问题,并且 R 输出只有在使用 cat
给出时才算正确:
minimumAbsoluteDifference <- function(arr) {
combos <- combn(arr, 2)
cat(min(abs(combos[1,] - combos[2,])))
}
# This works fine
input0 <- c(3, -7, 0)
minimumAbsoluteDifference(input0) #returns 3
# This fails
inputFail <- rpois(10e4, 1)
minimumAbsoluteDifference(inputFail)
#Error in matrix(r, nrow = len.r, ncol = count) :
# invalid 'ncol' value (too large or NA)
是这样的吗?
minimumAbsoluteDifference2 <- function(x){
stopifnot(length(x) >= 2)
n <- length(x)
inx <- rep(TRUE, n)
m <- NULL
for(i in seq_along(x)[-n]){
inx[i] <- FALSE
curr <- abs(x[i] - x[which(inx)])
m <- min(c(m, curr))
}
m
}
# This works fine
input0 <- c(3, -7, 0)
minimumAbsoluteDifference(input0) #returns 3
minimumAbsoluteDifference2(input0) #returns 3
set.seed(2020)
input1 <- rpois(1e3, 1)
minimumAbsoluteDifference(input1) #returns 0
minimumAbsoluteDifference2(input1) #returns 0
inputFail <- rpois(1e5, 1)
minimumAbsoluteDifference(inputFail) # This fails
minimumAbsoluteDifference2(inputFail) # This does not fail
TL;DR
不需要combn
之类的,简单的:
min(abs(diff(sort(v))))
实事求是
找到每个可能组合之间的差异是 O(n^2)
。因此,当我们得到长度为 1e5
的向量时,这项任务在计算和内存方面都是繁重的。
我们需要一种不同的方法。
How about sorting and taking the difference only with its neighbor?
通过第一次排序,对于任意元素vj,差值min |v j - vj -/+ 1| 将是涉及 vj[= 的最小差异60=]。例如,给定排序向量 v
:
v = -9 -8 -6 -4 -2 3 8
距 -2
的最小距离由下式给出:
|-2 - 3| = 5
|-4 - -2| = 2
不需要检查任何其他元素。
这很容易在基础 R
中实现,如下所示:
getAbsMin <- function(v) min(abs(diff(sort(v))))
我不会像使用任何大小合理的向量那样使用 rpois
,将产生重复项,这将简单地给出 0
作为答案。更明智的测试是 runif
或 sample
(minimumAbsoluteDifference2
来自@RuiBarradas 提供的答案):
set.seed(1729)
randUnif100 <- lapply(1:100, function(x) {
runif(1e3, -100, 100)
})
randInts100 <- lapply(1:100, function(x) {
sample(-(1e9):(1e9), 1e3)
})
head(sapply(randInts100, getAbsMin))
[1] 586 3860 2243 2511 5186 3047
identical(sapply(randInts100, minimumAbsoluteDifference2),
sapply(randInts100, getAbsMin))
[1] TRUE
options(scipen = 99)
head(sapply(randUnif100, getAbsMin))
[1] 0.00018277206 0.00020549633 0.00009834766 0.00008395873 0.00005299225 0.00009313226
identical(sapply(randUnif100, minimumAbsoluteDifference2),
sapply(randUnif100, getAbsMin))
[1] TRUE
它也非常快:
library(microbenchmark)
microbenchmark(a = getAbsMin(randInts100[[50]]),
b = minimumAbsoluteDifference2(randInts100[[50]]),
times = 25, unit = "relative")
Unit: relative
expr min lq mean median uq max neval
a 1.0000 1.0000 1.0000 1.0000 1.00000 1.00000 25
b 117.9799 113.2221 105.5144 107.6901 98.55391 81.05468 25
即使对于非常大的向量,结果也是即时的;
set.seed(321)
largeTest <- sample(-(1e12):(1e12), 1e6)
system.time(print(getAbsMin(largeTest)))
[1] 3
user system elapsed
0.083 0.003 0.087
给定一个数值向量,我想找到大小为 2 的组合的最小绝对差。但是,摩擦点来自于使用 combn
创建包含对的矩阵。当 matrix/vector 太大时,如何处理问题?
当使用 combn
的结果对数(列数)太大时,出现以下错误:
Error in matrix(r, nrow = len.r, ncol = count) : invalid 'ncol' value (too large or NA)
This post states that the size limit of a matrix is roughly one billion rows and two columns.
这是我用过的代码。对于在我的函数输出中使用 cat
表示歉意——我正在解决 HackerRank 中的 Minimum Absolute Difference in an Array 贪心算法问题,并且 R 输出只有在使用 cat
给出时才算正确:
minimumAbsoluteDifference <- function(arr) {
combos <- combn(arr, 2)
cat(min(abs(combos[1,] - combos[2,])))
}
# This works fine
input0 <- c(3, -7, 0)
minimumAbsoluteDifference(input0) #returns 3
# This fails
inputFail <- rpois(10e4, 1)
minimumAbsoluteDifference(inputFail)
#Error in matrix(r, nrow = len.r, ncol = count) :
# invalid 'ncol' value (too large or NA)
是这样的吗?
minimumAbsoluteDifference2 <- function(x){
stopifnot(length(x) >= 2)
n <- length(x)
inx <- rep(TRUE, n)
m <- NULL
for(i in seq_along(x)[-n]){
inx[i] <- FALSE
curr <- abs(x[i] - x[which(inx)])
m <- min(c(m, curr))
}
m
}
# This works fine
input0 <- c(3, -7, 0)
minimumAbsoluteDifference(input0) #returns 3
minimumAbsoluteDifference2(input0) #returns 3
set.seed(2020)
input1 <- rpois(1e3, 1)
minimumAbsoluteDifference(input1) #returns 0
minimumAbsoluteDifference2(input1) #returns 0
inputFail <- rpois(1e5, 1)
minimumAbsoluteDifference(inputFail) # This fails
minimumAbsoluteDifference2(inputFail) # This does not fail
TL;DR
不需要combn
之类的,简单的:
min(abs(diff(sort(v))))
实事求是
找到每个可能组合之间的差异是 O(n^2)
。因此,当我们得到长度为 1e5
的向量时,这项任务在计算和内存方面都是繁重的。
我们需要一种不同的方法。
How about sorting and taking the difference only with its neighbor?
通过第一次排序,对于任意元素vj,差值min |v j - vj -/+ 1| 将是涉及 vj[= 的最小差异60=]。例如,给定排序向量 v
:
v = -9 -8 -6 -4 -2 3 8
距 -2
的最小距离由下式给出:
|-2 - 3| = 5
|-4 - -2| = 2
不需要检查任何其他元素。
这很容易在基础 R
中实现,如下所示:
getAbsMin <- function(v) min(abs(diff(sort(v))))
我不会像使用任何大小合理的向量那样使用 rpois
,将产生重复项,这将简单地给出 0
作为答案。更明智的测试是 runif
或 sample
(minimumAbsoluteDifference2
来自@RuiBarradas 提供的答案):
set.seed(1729)
randUnif100 <- lapply(1:100, function(x) {
runif(1e3, -100, 100)
})
randInts100 <- lapply(1:100, function(x) {
sample(-(1e9):(1e9), 1e3)
})
head(sapply(randInts100, getAbsMin))
[1] 586 3860 2243 2511 5186 3047
identical(sapply(randInts100, minimumAbsoluteDifference2),
sapply(randInts100, getAbsMin))
[1] TRUE
options(scipen = 99)
head(sapply(randUnif100, getAbsMin))
[1] 0.00018277206 0.00020549633 0.00009834766 0.00008395873 0.00005299225 0.00009313226
identical(sapply(randUnif100, minimumAbsoluteDifference2),
sapply(randUnif100, getAbsMin))
[1] TRUE
它也非常快:
library(microbenchmark)
microbenchmark(a = getAbsMin(randInts100[[50]]),
b = minimumAbsoluteDifference2(randInts100[[50]]),
times = 25, unit = "relative")
Unit: relative
expr min lq mean median uq max neval
a 1.0000 1.0000 1.0000 1.0000 1.00000 1.00000 25
b 117.9799 113.2221 105.5144 107.6901 98.55391 81.05468 25
即使对于非常大的向量,结果也是即时的;
set.seed(321)
largeTest <- sample(-(1e12):(1e12), 1e6)
system.time(print(getAbsMin(largeTest)))
[1] 3
user system elapsed
0.083 0.003 0.087