R 是否错误地计算了具有低单元格计数的 2x2 表的卡方统计量？

Question

我刚刚注意到对于单元格频率较低的 2 x 2 表，即使使用 Yates 校正，R 似乎也无法正确计算 chi^2 统计数据。

mat <- matrix(c(3, 2, 14, 10), ncol = 2)
chi <- stats::chisq.test(mat)
## Warning message:
## In stats::chisq.test(mat) : Chi-squared approximation may be incorrect

# from the function
chi$statistic
##    X-squared 
## 1.626059e-31 

# as it should be (with Yates correction)
sum((abs(chi$observed - chi$expected) - 0.5)^2 / chi$expected)
## [1] 0.1851001

我是否认为 R 计算错误，而第二种方法产生 .185 更准确？还是小单元格计数意味着所有赌注都关闭了？

更新：

在没有 Yates 连续性校正的情况下，它似乎工作正常：

chi <- stats::chisq.test(mat, correct = FALSE)
## Warning message:
## In stats::chisq.test(mat, correct = FALSE) :
##   Chi-squared approximation may be incorrect

chi$statistic
##   X-squared 
## 0.004738562 

sum((abs(chi$observed - chi$expected))^2 / chi$expected)
## [1] 0.004738562

Answer 1

帮助file/man页面指出

one half is subtracted from all |O - E| differences; however,
the correction will not be bigger than the differences themselves.

你的例子中的差异都小于 0.5:

> chi$observed - chi$expected
            [,1]        [,2]
[1,]  0.06896552 -0.06896552
[2,] -0.06896552  0.06896552

所以，至少，它似乎是记录在案的行为。

旁注：如果有疑问，您显然可以使用通过模拟

找到的 p-values

> chi <- stats::chisq.test(mat, simulate.p.value=TRUE, B=1e6)
> chi

    Pearson's Chi-squared test with simulated p-value (based on 1e+06 replicates)

data:  mat
X-squared = 0.0047386, df = NA, p-value = 1

在这种情况下，它会在中间某处找到 chi-square 并消除警告。或者使用 fisher.test...

R 是否错误地计算了具有低单元格计数的 2x2 表的卡方统计量？

Does R incorrectly compute the chi-squared statistic for 2x2 tables with low cell counts?

r

chi-squared