R 中多组数据的正态性检验

Question

我正在尝试运行对我在 R 中的数据进行正态性测试。我的数据集是由 4 列字符和一列数值组成的数据框。目前，我在 R 中使用 Rstatix 包，其他类型的统计测试运行良好，如 wilcox_test() 和 kruskal_test() 但是当我尝试运行 shapiro_test() 它不起作用并出现以下错误：

data %>% group_by(treatment,chase,measure) %>% shapiro_test(value)
x
+-<error/dplyr:::mutate_error>
| Problem with `mutate()` input `data`.
| x Must group by variables found in `.data`.
| * Column `variable` is not found.
| i Input `data` is `map(.data$data, .f, ...)`.
\-<error/rlang_error>
  Must group by variables found in `.data`.
  * Column `variable` is not found.
Backtrace:
  1. dplyr::group_by(., treatment, chase, measure)
  2. rstatix::shapiro_test(., value)
 33. rstatix:::.f(.x[[i]], ...)
 11. dplyr::group_by(., variable)
 43. dplyr::group_by_prepare(.data, ..., .add = .add)

我的数据集如下：

    groups treatment chase measure     value
1   uncoated   control    30  colocA 17.912954
2   uncoated   control    30  colocA 16.806409
3   uncoated   control    30  colocA 20.322467
4   uncoated   control    30  colocA 15.953959
5   uncoated   control    30  colocA 22.566408
6   uncoated   control    30  colocA 17.780975
7   uncoated   control    30  colocA 19.764265
8   uncoated   control    30  colocA 16.928500
9   uncoated   control    30  colocA 22.931763
10  uncoated   control    30  colocA 18.101085
11  uncoated   control    30  distCC  1.159298
12  uncoated   control    30  distCC  1.174931
13  uncoated   control    30  distCC  1.190449
14  uncoated   control    30  distCC  1.265717
15  uncoated   control    30  distCC  1.103845
16  uncoated   control    30  distCC  1.125344
17  uncoated   control    30  distCC  1.290703
18  uncoated   control    30  distCC  1.172462
19  uncoated   control    30  distCC  1.065353
20  uncoated   control    30  distCC  1.048523
21    coated   control    30  colocA  6.062000
22    coated   control    30  colocA  9.370714
23    coated   control    30  colocA 12.898769
24    coated   control    30  colocA 20.398458
25    coated   control    30  colocA 11.174150
26    coated   control    30  colocA 17.574250
27    coated   control    30  colocA 12.481857
28    coated   control    30  colocA 21.565250
29    coated   control    30  colocA 21.743409
30    coated   control    30  colocA 12.699600
31    coated   control    30  distCC  4.317260
32    coated   control    30  distCC  4.263914
33    coated   control    30  distCC  5.136013
34    coated   control    30  distCC  3.142906
35    coated   control    30  distCC  2.617590
36    coated   control    30  distCC  4.149614
37    coated   control    30  distCC  4.995551
38    coated   control    30  distCC  3.851803
39    coated   control    30  distCC  4.606119
40    coated   control    30  distCC  2.820326

提前致谢。

Answer 1

这是 stats::shapiro.test 的方法。

library(dplyr)
library(broom)

data %>%
  group_by(treatment, chase, measure) %>% 
  do(tidy(shapiro.test(.$value)))
## A tibble: 2 x 6
## Groups:   treatment, chase, measure [2]
#  treatment chase measure statistic p.value method                     
#  <chr>     <int> <chr>       <dbl>   <dbl> <chr>                      
#1 control      30 colocA      0.940 0.244   Shapiro-Wilk normality test
#2 control      30 distCC      0.811 0.00128 Shapiro-Wilk normality test

Answer 2

我们也可以将输出包装在 list in summarise and unnest it

library(dplyr)
library(tidyr)
library(broom)
dat %>% 
    group_by(treatment, chase, measure) %>%
    summarise(out = list(shapiro.test(value) %>% tidy), .groups = 'drop') %>%
    unnest(c(out))
# A tibble: 2 x 6
#  treatment chase measure statistic p.value method                     
#  <chr>     <int> <chr>       <dbl>   <dbl> <chr>                      
#1 control      30 colocA      0.940 0.244   Shapiro-Wilk normality test
#2 control      30 distCC      0.811 0.00128 Shapiro-Wilk normality test

数据

dat <- structure(list(groups = c("uncoated", "uncoated", "uncoated", 
"uncoated", "uncoated", "uncoated", "uncoated", "uncoated", "uncoated", 
"uncoated", "uncoated", "uncoated", "uncoated", "uncoated", "uncoated", 
"uncoated", "uncoated", "uncoated", "uncoated", "uncoated", "coated", 
"coated", "coated", "coated", "coated", "coated", "coated", "coated", 
"coated", "coated", "coated", "coated", "coated", "coated", "coated", 
"coated", "coated", "coated", "coated", "coated"), treatment = c("control", 
"control", "control", "control", "control", "control", "control", 
"control", "control", "control", "control", "control", "control", 
"control", "control", "control", "control", "control", "control", 
"control", "control", "control", "control", "control", "control", 
"control", "control", "control", "control", "control", "control", 
"control", "control", "control", "control", "control", "control", 
"control", "control", "control"), chase = c(30L, 30L, 30L, 30L, 
30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 
30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 
30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L), measure = c("colocA", 
"colocA", "colocA", "colocA", "colocA", "colocA", "colocA", "colocA", 
"colocA", "colocA", "distCC", "distCC", "distCC", "distCC", "distCC", 
"distCC", "distCC", "distCC", "distCC", "distCC", "colocA", "colocA", 
"colocA", "colocA", "colocA", "colocA", "colocA", "colocA", "colocA", 
"colocA", "distCC", "distCC", "distCC", "distCC", "distCC", "distCC", 
"distCC", "distCC", "distCC", "distCC"), value = c(17.912954, 
16.806409, 20.322467, 15.953959, 22.566408, 17.780975, 19.764265, 
16.9285, 22.931763, 18.101085, 1.159298, 1.174931, 1.190449, 
1.265717, 1.103845, 1.125344, 1.290703, 1.172462, 1.065353, 1.048523, 
6.062, 9.370714, 12.898769, 20.398458, 11.17415, 17.57425, 12.481857, 
21.56525, 21.743409, 12.6996, 4.31726, 4.263914, 5.136013, 3.142906, 
2.61759, 4.149614, 4.995551, 3.851803, 4.606119, 2.820326)),
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", 
"25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35", 
"36", "37", "38", "39", "40"))

R 中多组数据的正态性检验

Normality test for multi-grouped data in R

r

hypothesis-test

rstatix

数据