向量对之间的最小绝对差(贪心算法)

Minimum absolute difference between vector pairs (greedy algorithm)

给定一个数值向量,我想找到大小为 2 的组合的最小绝对差。但是,摩擦点来自于使用 combn 创建包含对的矩阵。当 matrix/vector 太大时,如何处理问题?

当使用 combn 的结果对数(列数)太大时,出现以下错误:

Error in matrix(r, nrow = len.r, ncol = count) : invalid 'ncol' value (too large or NA)

This post states that the size limit of a matrix is roughly one billion rows and two columns.

这是我用过的代码。对于在我的函数输出中使用 cat 表示歉意——我正在解决 HackerRank 中的 Minimum Absolute Difference in an Array 贪心算法问题,并且 R 输出只有在使用 cat 给出时才算正确:

minimumAbsoluteDifference <- function(arr) {
  combos <- combn(arr, 2)
  
  cat(min(abs(combos[1,] - combos[2,])))
}

# This works fine
input0 <- c(3, -7, 0)

minimumAbsoluteDifference(input0) #returns 3

# This fails
inputFail <- rpois(10e4, 1)

minimumAbsoluteDifference(inputFail) 
  #Error in matrix(r, nrow = len.r, ncol = count) : 
  # invalid 'ncol' value (too large or NA)

是这样的吗?

minimumAbsoluteDifference2 <- function(x){
  stopifnot(length(x) >= 2)
  n <- length(x)
  inx <- rep(TRUE, n)
  m <- NULL
  for(i in seq_along(x)[-n]){
    inx[i] <- FALSE
    curr <- abs(x[i] - x[which(inx)])
    m <- min(c(m, curr))
  }
  m
}
# This works fine
input0 <- c(3, -7, 0)

minimumAbsoluteDifference(input0)  #returns 3
minimumAbsoluteDifference2(input0) #returns 3


set.seed(2020)
input1 <- rpois(1e3, 1)
minimumAbsoluteDifference(input1)  #returns 0
minimumAbsoluteDifference2(input1) #returns 0

inputFail <- rpois(1e5, 1)
minimumAbsoluteDifference(inputFail)   # This fails
minimumAbsoluteDifference2(inputFail)  # This does not fail

TL;DR

不需要combn之类的,简单的:

min(abs(diff(sort(v))))

实事求是

找到每个可能组合之间的差异是 O(n^2)。因此,当我们得到长度为 1e5 的向量时,这项任务在计算和内存方面都是繁重的。

我们需要一种不同的方法。

How about sorting and taking the difference only with its neighbor?

通过第一次排序,对于任意元素vj,差值min |v j - vj -/+ 1| 将是涉及 vj[= 的最小差异60=]。例如,给定排序向量 v:

v = -9 -8 -6 -4 -2  3  8

-2 的最小距离由下式给出:

|-2 - 3| = 5
|-4 - -2| = 2

不需要检查任何其他元素。

这很容易在基础 R 中实现,如下所示:

getAbsMin <- function(v) min(abs(diff(sort(v))))

我不会像使用任何大小合理的向量那样使用 rpois,将产生重复项,这将简单地给出 0 作为答案。更明智的测试是 runifsampleminimumAbsoluteDifference2 来自@RuiBarradas 提供的答案):

set.seed(1729)
randUnif100 <- lapply(1:100, function(x) {
    runif(1e3, -100, 100)
})

randInts100 <- lapply(1:100, function(x) {
    sample(-(1e9):(1e9), 1e3)
})

head(sapply(randInts100, getAbsMin))
[1]  586 3860 2243 2511 5186 3047

identical(sapply(randInts100, minimumAbsoluteDifference2),
          sapply(randInts100, getAbsMin))
[1] TRUE

options(scipen = 99)
head(sapply(randUnif100, getAbsMin))
[1] 0.00018277206 0.00020549633 0.00009834766 0.00008395873 0.00005299225 0.00009313226

identical(sapply(randUnif100, minimumAbsoluteDifference2),
          sapply(randUnif100, getAbsMin))
[1] TRUE

它也非常快:

library(microbenchmark)
microbenchmark(a = getAbsMin(randInts100[[50]]),
               b = minimumAbsoluteDifference2(randInts100[[50]]),
               times = 25, unit = "relative")
Unit: relative
expr      min       lq     mean   median       uq      max neval
   a   1.0000   1.0000   1.0000   1.0000  1.00000  1.00000    25
   b 117.9799 113.2221 105.5144 107.6901 98.55391 81.05468    25

即使对于非常大的向量,结果也是即时的;

set.seed(321)
largeTest <- sample(-(1e12):(1e12), 1e6)

system.time(print(getAbsMin(largeTest)))
[1] 3
 user  system elapsed 
0.083   0.003   0.087