Shapiro Wilk 检验的临界值

Question

我正在尝试获取 R 中 Shapiro Wilk 检验的临界 W 值。

        Shapiro-Wilk normality test

data:  samplematrix[, 1]
W = 0.69661, p-value = 7.198e-09

在 n=50 和 alpha=.05 的情况下，通过计算临界值 table，我知道临界值 W=.947。但是，如何使用 R 获得这个临界值？

Answer 1

直接计算临界值并不容易（见此CrossValidated answer）；我在这里得到的内容与该答案中的内容基本相同（尽管我独立提出了它，并且它通过使用订单统计而不是随机样本稍微改进了该答案）。这个想法是，我们可以使样本逐渐变得更加非正态，直到它准确地获得所需的 p 值（在本例中为 0.05），然后查看与该样本对应的 W 统计量。

## compute S-W for a given Gamma shape parameter and sample size
tmpf <- function(gshape=20,n=50) {
    shapiro.test(qgamma((1:n)/(n+1),scale=1,shape=gshape))
}
## find shape parameter that corresponds to a particular p-value
find.shape <- function(n,alpha) {
    uniroot(function(x) tmpf(x,n)$p.value-alpha,
            interval=c(0.01,100))$root
}
find.W <- function(n,alpha) {
   s <- find.shape(n,alpha)
   tmpf(s,n=n)$statistic
}
find.W(50,0.05)

答案 (0.9540175) 与您得到的答案不完全相同，因为 R 使用了 Shapiro-Wilk 检验的近似值。据我所知，实际的 S-W 临界值表完全来自 Shapiro 和 Wilk 1965 Biometrika http://www.jstor.org/stable/2333709 p. 605，只写 "Based on fitted Johnson (1949) S_B approximation, see Shapiro and Wilk 1965a for details" - 而 "Shapiro and Wilk 1965a" 指的是未发表的手稿！（S&W 本质上对许多正态偏差进行采样，计算 SW 统计量，在一系列值上构建 SW 统计量的平滑近似值，并从该分布中获取临界值）。

我也尝试通过蛮力来做到这一点，但是（见下文）如果我们想要天真而不像 SW 那样进行曲线拟合，我们将需要更大的样本...

find.W.stoch <- function(n=50,alpha=0.05,N=200000,.progress="none") {
   d <- plyr::raply(N,.Call(stats:::C_SWilk,sort(rnorm(n))),
                    .progress=.progress)
   return(quantile(d[1,],1-alpha))
}

将原始 S&W 值（从论文中转录）与 R 近似值进行比较：

  SW1965 <- c(0.767,0.748,0.762,0.788,0.803,0.818,0.829,0.842,
    0.850,0.859,0.866,0.874,0.881,0.887,0.892,0.897,0.901,0.905,
    0.908,0.911,0.914,0.916,0.918,0.920,0.923,0.924,0.926,0.927,
    0.929,0.930,0.931,0.933,0.934,0.935,0.936,0.938,0.939,0.940,
    0.941,0.942,0.943,0.944,0.945,0.945,0.946,0.947,0.947,0.947)
  Rapprox <- sapply(3:50,find.W,alpha=0.05)
  Rapprox.stoch <- sapply(3:50,find.W.stoch,alpha=0.05,.progress="text")
  par(bty="l",las=1)
  matplot(3:50,cbind(SW1965,Rapprox,Rapprox.stoch),col=c(1,2,4),
          type="l",
          xlab="n",ylab=~W[crit])
  legend("bottomright",col=c(1,2,4),lty=1:3,
         c("SW orig","R approx","stoch"))

Shapiro Wilk 检验的临界值

Critical Value for Shapiro Wilk test

statistics

r

distribution

normal-distribution