fisher.test 使 R 崩溃并出现 * 捕获段错误 * 错误

Question

如标题所述，fisher.test R 崩溃并出现 *** caught segfault *** 错误。这是产生错误的代码：

d<-matrix(c(1,0,5,2,1,90,0,0,0,1,0,14,0,0,0,0,0,5,0,
            0,0,0,0,2,0,0,0,0,0,2,2,1,0,2,3,89),
          nrow=6,byrow = TRUE)
fisher.test(d,simulate.p.value=FALSE)

我发现了这个，因为我在一些函数中使用了 fisher.test。运行他们在生成 R 的数据上因上述错误而崩溃。我知道提供给 fisher.test 的 table 行为不当，但我想这种事情不应该发生。

对于意外事件 table 应满足哪些条件以避免因 fisher.test 不当行为导致此类崩溃的任何建议，我将不胜感激。为了避免崩溃，还应该在 fisher.test 中设置哪些其他参数，我做了一些测试，其中

fisher.test(d,simulate.p.value=TRUE)

不会崩溃并产生结果。

我提出这个要求是因为我必须实施它以避免未来我的管道崩溃。

Answer 1

我可以确认这是 R 4.2 中的错误，现在已在 R 的开发分支中修复（5 月 7 日 this commit）。如果它很快被移植到 patch-release，我不会感到惊讶，但这对 R 开发人员来说是 unknown/up。运行您上面的示例不再出现段错误，但它确实会引发错误：

Error in fisher.test(d, simulate.p.value = FALSE) : FEXACT[f3xact()] error: hash key 5e+09 > INT_MAX, kyy=203, it[i (= nco = 6)]= 0.
Rather set 'simulate.p.value=TRUE'

所以这会让你的工作流程更好（你可以用 try()/tryCatch() 处理这些错误），但如果你真的想执行 exact Fisher 对这些数据进行检验。（对具有大条目的大 tables 的精确测试在计算上极其困难，因为它们本质上必须对所有可能的 tables 集合进行计算边际价值。）

我没有任何绝妙的想法来检测会导致此问题的确切条件（也许您可以根据 table 的维度和计数的总和想出一个粗略的规则在 table 中，例如 if (prod(dim(d)) > 30 && sum(d) > 200) ... ?)

设置simulate.p.value=TRUE是最明智的做法。但是，如果您期望极端 tables 的精确结果（例如，您从事生物信息学工作并且打算对结果应用巨大的 multiple-comparisons 校正），您将会失望。例如：

dd <- matrix(0, 6, 6)
dd[5,5] <- dd[6,6] <- 100
fisher.test(dd)$p.value 
## 2.208761e-59, reported as "< 2.2e-16"
fisher.test(dd, simulate.p.value = TRUE, B = 10000)$p.value
# 9.999e-05

fisher.test(..., simulate.p.value = TRUE) 永远不会 return 小于 1/(B+1) 的值（如果模拟的 none 会发生这种情况table 比观察到的 table 更极端：从技术上讲，p-value 应该报告为“<= 9.999e-05”）。因此，你永远（在宇宙的生命周期中）能够计算出像 1e-59 这样的 p-value，你只能根据你愿意做的大小来设置一个界限 B.

fisher.test 使 R 崩溃并出现 * 捕获段错误 * 错误

fisher.test crash R with * caught segfault * error

crash

statistics

r

fisher.test 使 R 崩溃并出现 *** 捕获段错误 *** 错误

fisher.test crash R with *** caught segfault *** error

crash

statistics

r

fisher.test 使 R 崩溃并出现 * 捕获段错误 * 错误

fisher.test crash R with * caught segfault * error