在 R 中执行多重逻辑比较的最快方法是什么？

Question

在 R 中执行多重逻辑比较的最快方法是什么？

例如考虑向量 x

set.seed(14)
x = sample(LETTERS[1:4], size=10, replace=TRUE)

我想测试 x 的每个条目是 "A" 还是 "B"（而不是其他任何内容）。以下作品

x == "A" | x == "B"
[1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE

上面的代码在整个向量的长度上循环了三次。在 R 中有没有一种方法可以只循环一次并测试每个项目是否满足一个或另一个条件？

Answer 1

如果您的 objective 只是为了进行一次传递，那么即使您没有太多 C++ 经验，用 Rcpp 编写也非常简单：

#include <Rcpp.h>

// [[Rcpp::export]]
Rcpp::LogicalVector single_pass(Rcpp::CharacterVector x, Rcpp::String a, Rcpp::String b) {
  R_xlen_t i = 0, n = x.size();
  Rcpp::LogicalVector result(n);

  for ( ; i < n; i++) {
    result[i] = (x[i] == a || x[i] == b);
  }

  return result;
}

对于您示例中使用的这样一个小对象，.Call 的轻微开销（大概）掩盖了 Rcpp 版本的速度，

r_fun <- function(X) X == "A" | X == "B"
##
cpp_fun <- function(X) single_pass(X, "A", "B")
##
all.equal(r_fun(x), cpp_fun(x))
#[1] TRUE
microbenchmark::microbenchmark(
  r_fun(x), cpp_fun(x), times = 1000L)
#Unit: microseconds
#expr         min    lq     mean median     uq    max neval
#r_fun(x)   1.499 1.584 1.974156 1.6795 1.8535 37.903  1000
#cpp_fun(x) 1.860 2.334 3.042671 2.7450 3.1140 51.870  1000

但是对于更大的向量（我假设这是你的真实意图），它要快得多：

x2 <- sample(LETTERS, 10E5, replace = TRUE)
##
all.equal(r_fun(x2), cpp_fun(x2))
# [1] TRUE
microbenchmark::microbenchmark(
  r_fun(x2), cpp_fun(x2), times = 200L)
#Unit: milliseconds
#expr              min        lq      mean    median        uq      max neval
#r_fun(x2)   78.044518 79.344465 83.741901 80.999538 86.368627 149.5106   200
#cpp_fun(x2)  7.104929  7.201296  7.797983  7.605039  8.184628  10.7250   200

这里是 quick attempt 对上述内容的概括，如果您对它有任何用处的话。

在 R 中执行多重逻辑比较的最快方法是什么？

What is the fastest way to perform multiple logical comparisons in R?

performance

r

logical-operators