计划核心 ... 未交付结果,作业的所有值将在 R 4.0.1 中的 parallel::mclapply() 中受到影响

scheduled cores ... did not deliver results, all values of the jobs will be affected in parallel::mclapply() in R 4.0.1

我将 parallel::mclapply() 与 R 4.0.1 一起使用并收到以下警告:“计划的核心...未交付结果,作业的所有值都将受到影响".

这是我调查的结果:检查函数源代码,我意识到它发生在向量 dr 不全为真时。这意味着对于某些内核,下面的 for 循环内的第二个条件 (is.raw(a)) 永远不会执行。 areadChild()返回的值,如果至少返回一次原始数据,则至少验证一次条件。所以我认为 readChild() 正在返回 NULL。

readChild and readChildren return a raw vector with a "pid" attribute if data were available, an integer vector of length one with the process ID if a child terminated or NULL if the child no longer exists (no children at all for readChildren).

我请你验证或拒绝我的结论。最后,如果为真,可能的原因是什么?

    while (!all(fin)) {
        s <- selectChildren(ac[!fin], -1)
        if (is.null(s)) break # no children -> no hope we get anything (should not happen)
        if (is.integer(s))
            for (ch in s) {
                a <- readChild(ch)
                if (is.integer(a)) {
                    core <- which(cp == a)
                    fin[core] <- TRUE
                } else if (is.raw(a)) {
                    core <- which(cp == attr(a, "pid"))
                    job.res[[core]] <- ijr <- unserialize(a)
                    if (inherits(ijr, "try-error"))
                        has.errors <- c(has.errors, core)
                    dr[core] <- TRUE
                } else if (is.null(a)) {
                    # the child no longer exists (should not happen)
                    core <- which(cp == ch)
                    fin[core] <- TRUE
                }
            }
    }

子进程 dies/crashes 时可能会出现此错误消息,例如

> y <- parallel::mclapply(1:2, FUN = function(x) if (x == 1) quit("no") else x)
Warning message:
In parallel::mclapply(1:2, FUN = function(x) if (x == 1) quit("no") else x) :
  scheduled core 1 did not deliver a result, all values of the job will be affected

> str(y)
List of 2
 $ : NULL
 $ : int 2

一个子进程彻底死掉当然不好。它的发生可能有多种原因。我最好的猜测是你并行化了一些不能并行化的东西。众所周知,分叉处理 (=mclapply()) 对于多线程等代码不稳定。

不管有什么用,如果您改用 future 框架(免责声明:我是作者),您会收到一条信息更丰富的错误消息,例如

> library(future.apply)
> plan(multicore)

> y <- future_lapply(1:2, FUN = function(x) if (x == 1) quit("no") else x)
Error: Failed to retrieve the result of MulticoreFuture (future_lapply-1) from
the forked worker (on localhost; PID 19959). Post-mortem diagnostic: No process
exists with this PID, i.e. the forked localhost worker is no longer alive.