R 中多组数据的正态性检验
Normality test for multi-grouped data in R
我正在尝试 运行 对我在 R 中的数据进行正态性测试。我的数据集是由 4 列字符和一列数值组成的数据框。目前,我在 R 中使用 Rstatix 包,其他类型的统计测试运行良好,如 wilcox_test() 和 kruskal_test() 但是当我尝试 运行 shapiro_test() 它不起作用并出现以下错误:
data %>% group_by(treatment,chase,measure) %>% shapiro_test(value)
x
+-<error/dplyr:::mutate_error>
| Problem with `mutate()` input `data`.
| x Must group by variables found in `.data`.
| * Column `variable` is not found.
| i Input `data` is `map(.data$data, .f, ...)`.
\-<error/rlang_error>
Must group by variables found in `.data`.
* Column `variable` is not found.
Backtrace:
1. dplyr::group_by(., treatment, chase, measure)
2. rstatix::shapiro_test(., value)
33. rstatix:::.f(.x[[i]], ...)
11. dplyr::group_by(., variable)
43. dplyr::group_by_prepare(.data, ..., .add = .add)
我的数据集如下:
groups treatment chase measure value
1 uncoated control 30 colocA 17.912954
2 uncoated control 30 colocA 16.806409
3 uncoated control 30 colocA 20.322467
4 uncoated control 30 colocA 15.953959
5 uncoated control 30 colocA 22.566408
6 uncoated control 30 colocA 17.780975
7 uncoated control 30 colocA 19.764265
8 uncoated control 30 colocA 16.928500
9 uncoated control 30 colocA 22.931763
10 uncoated control 30 colocA 18.101085
11 uncoated control 30 distCC 1.159298
12 uncoated control 30 distCC 1.174931
13 uncoated control 30 distCC 1.190449
14 uncoated control 30 distCC 1.265717
15 uncoated control 30 distCC 1.103845
16 uncoated control 30 distCC 1.125344
17 uncoated control 30 distCC 1.290703
18 uncoated control 30 distCC 1.172462
19 uncoated control 30 distCC 1.065353
20 uncoated control 30 distCC 1.048523
21 coated control 30 colocA 6.062000
22 coated control 30 colocA 9.370714
23 coated control 30 colocA 12.898769
24 coated control 30 colocA 20.398458
25 coated control 30 colocA 11.174150
26 coated control 30 colocA 17.574250
27 coated control 30 colocA 12.481857
28 coated control 30 colocA 21.565250
29 coated control 30 colocA 21.743409
30 coated control 30 colocA 12.699600
31 coated control 30 distCC 4.317260
32 coated control 30 distCC 4.263914
33 coated control 30 distCC 5.136013
34 coated control 30 distCC 3.142906
35 coated control 30 distCC 2.617590
36 coated control 30 distCC 4.149614
37 coated control 30 distCC 4.995551
38 coated control 30 distCC 3.851803
39 coated control 30 distCC 4.606119
40 coated control 30 distCC 2.820326
提前致谢。
这是 stats::shapiro.test
的方法。
library(dplyr)
library(broom)
data %>%
group_by(treatment, chase, measure) %>%
do(tidy(shapiro.test(.$value)))
## A tibble: 2 x 6
## Groups: treatment, chase, measure [2]
# treatment chase measure statistic p.value method
# <chr> <int> <chr> <dbl> <dbl> <chr>
#1 control 30 colocA 0.940 0.244 Shapiro-Wilk normality test
#2 control 30 distCC 0.811 0.00128 Shapiro-Wilk normality test
我们也可以将输出包装在 list
in summarise
and unnest
it
library(dplyr)
library(tidyr)
library(broom)
dat %>%
group_by(treatment, chase, measure) %>%
summarise(out = list(shapiro.test(value) %>% tidy), .groups = 'drop') %>%
unnest(c(out))
# A tibble: 2 x 6
# treatment chase measure statistic p.value method
# <chr> <int> <chr> <dbl> <dbl> <chr>
#1 control 30 colocA 0.940 0.244 Shapiro-Wilk normality test
#2 control 30 distCC 0.811 0.00128 Shapiro-Wilk normality test
数据
dat <- structure(list(groups = c("uncoated", "uncoated", "uncoated",
"uncoated", "uncoated", "uncoated", "uncoated", "uncoated", "uncoated",
"uncoated", "uncoated", "uncoated", "uncoated", "uncoated", "uncoated",
"uncoated", "uncoated", "uncoated", "uncoated", "uncoated", "coated",
"coated", "coated", "coated", "coated", "coated", "coated", "coated",
"coated", "coated", "coated", "coated", "coated", "coated", "coated",
"coated", "coated", "coated", "coated", "coated"), treatment = c("control",
"control", "control", "control", "control", "control", "control",
"control", "control", "control", "control", "control", "control",
"control", "control", "control", "control", "control", "control",
"control", "control", "control", "control", "control", "control",
"control", "control", "control", "control", "control", "control",
"control", "control", "control", "control", "control", "control",
"control", "control", "control"), chase = c(30L, 30L, 30L, 30L,
30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L,
30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L,
30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L), measure = c("colocA",
"colocA", "colocA", "colocA", "colocA", "colocA", "colocA", "colocA",
"colocA", "colocA", "distCC", "distCC", "distCC", "distCC", "distCC",
"distCC", "distCC", "distCC", "distCC", "distCC", "colocA", "colocA",
"colocA", "colocA", "colocA", "colocA", "colocA", "colocA", "colocA",
"colocA", "distCC", "distCC", "distCC", "distCC", "distCC", "distCC",
"distCC", "distCC", "distCC", "distCC"), value = c(17.912954,
16.806409, 20.322467, 15.953959, 22.566408, 17.780975, 19.764265,
16.9285, 22.931763, 18.101085, 1.159298, 1.174931, 1.190449,
1.265717, 1.103845, 1.125344, 1.290703, 1.172462, 1.065353, 1.048523,
6.062, 9.370714, 12.898769, 20.398458, 11.17415, 17.57425, 12.481857,
21.56525, 21.743409, 12.6996, 4.31726, 4.263914, 5.136013, 3.142906,
2.61759, 4.149614, 4.995551, 3.851803, 4.606119, 2.820326)),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24",
"25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35",
"36", "37", "38", "39", "40"))
我正在尝试 运行 对我在 R 中的数据进行正态性测试。我的数据集是由 4 列字符和一列数值组成的数据框。目前,我在 R 中使用 Rstatix 包,其他类型的统计测试运行良好,如 wilcox_test() 和 kruskal_test() 但是当我尝试 运行 shapiro_test() 它不起作用并出现以下错误:
data %>% group_by(treatment,chase,measure) %>% shapiro_test(value)
x
+-<error/dplyr:::mutate_error>
| Problem with `mutate()` input `data`.
| x Must group by variables found in `.data`.
| * Column `variable` is not found.
| i Input `data` is `map(.data$data, .f, ...)`.
\-<error/rlang_error>
Must group by variables found in `.data`.
* Column `variable` is not found.
Backtrace:
1. dplyr::group_by(., treatment, chase, measure)
2. rstatix::shapiro_test(., value)
33. rstatix:::.f(.x[[i]], ...)
11. dplyr::group_by(., variable)
43. dplyr::group_by_prepare(.data, ..., .add = .add)
我的数据集如下:
groups treatment chase measure value
1 uncoated control 30 colocA 17.912954
2 uncoated control 30 colocA 16.806409
3 uncoated control 30 colocA 20.322467
4 uncoated control 30 colocA 15.953959
5 uncoated control 30 colocA 22.566408
6 uncoated control 30 colocA 17.780975
7 uncoated control 30 colocA 19.764265
8 uncoated control 30 colocA 16.928500
9 uncoated control 30 colocA 22.931763
10 uncoated control 30 colocA 18.101085
11 uncoated control 30 distCC 1.159298
12 uncoated control 30 distCC 1.174931
13 uncoated control 30 distCC 1.190449
14 uncoated control 30 distCC 1.265717
15 uncoated control 30 distCC 1.103845
16 uncoated control 30 distCC 1.125344
17 uncoated control 30 distCC 1.290703
18 uncoated control 30 distCC 1.172462
19 uncoated control 30 distCC 1.065353
20 uncoated control 30 distCC 1.048523
21 coated control 30 colocA 6.062000
22 coated control 30 colocA 9.370714
23 coated control 30 colocA 12.898769
24 coated control 30 colocA 20.398458
25 coated control 30 colocA 11.174150
26 coated control 30 colocA 17.574250
27 coated control 30 colocA 12.481857
28 coated control 30 colocA 21.565250
29 coated control 30 colocA 21.743409
30 coated control 30 colocA 12.699600
31 coated control 30 distCC 4.317260
32 coated control 30 distCC 4.263914
33 coated control 30 distCC 5.136013
34 coated control 30 distCC 3.142906
35 coated control 30 distCC 2.617590
36 coated control 30 distCC 4.149614
37 coated control 30 distCC 4.995551
38 coated control 30 distCC 3.851803
39 coated control 30 distCC 4.606119
40 coated control 30 distCC 2.820326
提前致谢。
这是 stats::shapiro.test
的方法。
library(dplyr)
library(broom)
data %>%
group_by(treatment, chase, measure) %>%
do(tidy(shapiro.test(.$value)))
## A tibble: 2 x 6
## Groups: treatment, chase, measure [2]
# treatment chase measure statistic p.value method
# <chr> <int> <chr> <dbl> <dbl> <chr>
#1 control 30 colocA 0.940 0.244 Shapiro-Wilk normality test
#2 control 30 distCC 0.811 0.00128 Shapiro-Wilk normality test
我们也可以将输出包装在 list
in summarise
and unnest
it
library(dplyr)
library(tidyr)
library(broom)
dat %>%
group_by(treatment, chase, measure) %>%
summarise(out = list(shapiro.test(value) %>% tidy), .groups = 'drop') %>%
unnest(c(out))
# A tibble: 2 x 6
# treatment chase measure statistic p.value method
# <chr> <int> <chr> <dbl> <dbl> <chr>
#1 control 30 colocA 0.940 0.244 Shapiro-Wilk normality test
#2 control 30 distCC 0.811 0.00128 Shapiro-Wilk normality test
数据
dat <- structure(list(groups = c("uncoated", "uncoated", "uncoated",
"uncoated", "uncoated", "uncoated", "uncoated", "uncoated", "uncoated",
"uncoated", "uncoated", "uncoated", "uncoated", "uncoated", "uncoated",
"uncoated", "uncoated", "uncoated", "uncoated", "uncoated", "coated",
"coated", "coated", "coated", "coated", "coated", "coated", "coated",
"coated", "coated", "coated", "coated", "coated", "coated", "coated",
"coated", "coated", "coated", "coated", "coated"), treatment = c("control",
"control", "control", "control", "control", "control", "control",
"control", "control", "control", "control", "control", "control",
"control", "control", "control", "control", "control", "control",
"control", "control", "control", "control", "control", "control",
"control", "control", "control", "control", "control", "control",
"control", "control", "control", "control", "control", "control",
"control", "control", "control"), chase = c(30L, 30L, 30L, 30L,
30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L,
30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L,
30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L), measure = c("colocA",
"colocA", "colocA", "colocA", "colocA", "colocA", "colocA", "colocA",
"colocA", "colocA", "distCC", "distCC", "distCC", "distCC", "distCC",
"distCC", "distCC", "distCC", "distCC", "distCC", "colocA", "colocA",
"colocA", "colocA", "colocA", "colocA", "colocA", "colocA", "colocA",
"colocA", "distCC", "distCC", "distCC", "distCC", "distCC", "distCC",
"distCC", "distCC", "distCC", "distCC"), value = c(17.912954,
16.806409, 20.322467, 15.953959, 22.566408, 17.780975, 19.764265,
16.9285, 22.931763, 18.101085, 1.159298, 1.174931, 1.190449,
1.265717, 1.103845, 1.125344, 1.290703, 1.172462, 1.065353, 1.048523,
6.062, 9.370714, 12.898769, 20.398458, 11.17415, 17.57425, 12.481857,
21.56525, 21.743409, 12.6996, 4.31726, 4.263914, 5.136013, 3.142906,
2.61759, 4.149614, 4.995551, 3.851803, 4.606119, 2.820326)),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24",
"25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35",
"36", "37", "38", "39", "40"))