RStudio 错误 - 创建大型环境对象:protect():保护堆栈溢出

RStudio error - creating large environment object: protect(): protection stack overflow

我想创建键值对的大型查找 table,尝试这样做:

# actual use case is length ~5 million
key <- do.call(paste0, Map(stringi::stri_rand_strings, n=2e5, length = 16))
val <- sample.int(750, size = 2e5, replace = T)

make_dict <- function(keys, values){
  require(rlang)
  e <- new.env(size = length(keys))
  l <- list2(!!!setNames(values, keys))
  list2env(l, envir = e, hash = T) # problem in here...?
}

d <- make_dict(key, val)

问题

make_dict 为 运行 时,它抛出 Error: protect(): protection stack overflow。具体在RStudio中,当输入是一个长度大于49991的向量时,这似乎与this Whosebug post.[=29非常相似=]

然而,当我 运行 访问器函数获取一些值时,似乎 make_dict 运行 毕竟很好,因为我找不到它的任何奇怪之处结果:

`%||%` <- function(x,y) if(is.null(x)) y else x
grab <- function(...){
  vector("integer", length(..2)) |>
    (\(.){. = Vectorize(\(e, x) e[[x]] %||% NA_integer_, list("x"), T, F)(..1, ..2); .})()
}
out <- vector("integer", length(key))
out <- grab(d, sample(key)) # using sample to scramble the keys

anyNA(out) | !lobstr::obj_size(out) == lobstr::obj_size(val)
[1] FALSE

运行 RGui 中的相同代码不会抛出错误。

怪癖

  1. 对于大小 > 5e4,d 环境对象不会出现在 RStudio 的环境窗格中。
  2. R 控制台 returns 迅速返回到 >(表示函数已完成),但在抛出错误之前没有响应
  3. 如果manually settingoptions(expressions = 5e5)会抛出错误,或者保留默认值5000
  4. 何时抛出错误与输入向量的大小成正比
  5. tryCatch(make_dict(key, val), error = function(e) e) 没有发现错误
  6. 如果代码是 运行 来自包(通过 remotes::install_github("D-Se/minimal") 提供的打包版本),也会出现此错误

问题

这是怎么回事?如何解决此类错误?

options(error = traceback) 建议 here didn't give any results. Inserting a browser() after list2env in the make_dict function throws an error long after the browser has opened. A traceback() gives the function .rs.describeObject, which is used to generate the summary in the Environment pane, and can be found here.

traceback()

# .rs.describeObject
(function (env, objName, computeSize = TRUE) 
   {
       obj <- get(objName, env)
       hasNullPtr <- .Call("rs_hasExternalPointer", obj, TRUE, PACKAGE = "(embedding)")
       if (hasNullPtr) {
           val <- "<Object with null pointer>"
           desc <- "An R object containing a null external pointer"
           size <- 0
           len <- 0
       }
       else {
           val <- "(unknown)"
           desc <- ""
           size <- if (computeSize) 
               object.size(obj)
           else 0
           len <- length(obj)
       }
       class <- .rs.getSingleClass(obj)
       contents <- list()
       contents_deferred <- FALSE
       if (is.language(obj) || is.symbol(obj)) {
           val <- deparse(obj)
       }
       else if (!hasNullPtr) {
           if (size > 524288) {
               len_desc <- if (len > 1) 
                   paste(len, " elements, ", sep = "")
               else ""
               if (is.data.frame(obj)) {
                   val <- "NO_VALUE"
                   desc <- .rs.valueDescription(obj)
               }
               else {
                   val <- paste("Large ", class, " (", len_desc, 
                     format(size, units = "auto", standard = "SI"), 
                     ")", sep = "")
               }
               contents_deferred <- TRUE
           }
           else {
               val <- .rs.valueAsString(obj)
               desc <- .rs.valueDescription(obj)
               if (class == "data.table" || class == "ore.frame" || 
                   class == "cast_df" || class == "xts" || class == 
                   "DataFrame" || is.list(obj) || is.data.frame(obj) || 
                   isS4(obj)) {
                   if (computeSize) {
                     contents <- .rs.valueContents(obj)
                   }
                   else {
                     val <- "NO_VALUE"
                     contents_deferred <- TRUE
                   }
               }
           }
       }
       list(name = .rs.scalar(objName), type = .rs.scalar(class), 
           clazz = c(class(obj), typeof(obj)), is_data = .rs.scalar(is.data.frame(obj)), 
           value = .rs.scalar(val), description = .rs.scalar(desc), 
           size = .rs.scalar(size), length = .rs.scalar(len), contents = contents, 
           contents_deferred = .rs.scalar(contents_deferred))
   })(<environment>, "d", TRUE)

@technocrat 指出的这个 github issue 讨论了 RStudio 早期版本中禁用 空外部指针检查 的一个已知错误,此后已通过添加解决.rs.describeObject()

中的额外偏好检查
.rs.readUiPref("check_null_external_pointers")

检查代码是否来自RStudio 运行,如果该版本低于某个版本号之前的版本(这里我使用当前的官方版本),可以在函数,或在包的 .OnAttach 中:

if(!is.na(Sys.getenv("RSTUDIO", unset = NA)) && .rs.api.versionInfo()$version < "2021.9.1.372")){
  # warning or action
}