在 R 中同时进行多个 Grubbs 测试

Question

我是 R 的新手，刚开始使用 outliers 包。可能这很容易，但是谁能告诉我如何同时运行多个 Grubbs 测试？我有 20 列，我想同时测试所有列。提前致谢

编辑：抱歉没有解释清楚。我会尽力。我今天开始使用 R，并且学习了如何使用 grubbs.test(data$S1, type=10 or 11 or 20) 进行 Grubbs 测试，并且一切顺利。但是我有一个包含 20 列的 table，我想同时对每个列进行运行 Grubbs 测试。我可以一个一个地做，但我认为必须有一种方法可以做得更快。我也运行 How to repeat the Grubbs test and flag the outliers 处的代码，并且工作完美，但同样，我想用我的 20 个样本来做。以我的数据为例：

S1 S2 S3 S4 S5 S6 S7 96 40 99 45 12 16 48 52 49 11 49 59 77 64 18 43 11 67 6 97 91 79 19 39 28 45 44 99 9 78 88 6 25 43 78 60 12 29 32 2 68 25 18 61 60 30 26 51 70 96 98 55 74 83 17 69 19 0 17 24 0 75 45 42 70 71 7 61 82 100 39 80 71 58 6 100 94 100 5 41 18 33 98 97

希望对您有所帮助。

Answer 1

您可以使用 lapply:

library(outliers)

df = data.frame(a=runif(20),b=runif(20),c=runif(20))
tests = lapply(df,grubbs.test) 
# or with parameters:
tests = lapply(df,grubbs.test,opposite=T)

结果：

> tests
$a

    Grubbs test for one outlier

data:  X[[i]]
G = 1.80680, U = 0.81914, p-value = 0.6158
alternative hypothesis: highest value 0.963759744539857 is an outlier


$b

    Grubbs test for one outlier

data:  X[[i]]
G = 1.53140, U = 0.87008, p-value = 1
alternative hypothesis: highest value 0.975481075001881 is an outlier


$c

    Grubbs test for one outlier

data:  X[[i]]
G = 1.57910, U = 0.86186, p-value = 1
alternative hypothesis: lowest value 0.0136249314527959 is an outlier

您可以访问如下结果：

> tests$a$statistic
        G         U 
1.8067906 0.8191417

希望对您有所帮助。

Answer 2

@Florian 的回答可以稍微更新一下。例如，使用 purrr 包和 tidyverse 可以实现精美易读的结果。如果您正在比较组的负载，它会很有用：

加载必要的包：

library(dplyr)
library(purrr)
library(tidyr)
library(outliers)

创建一些数据 - 我们将使用来自的相同数据，但转换为现代 tibble 和长格式：

df <-  tibble(a = runif(20), 
              b = runif(20),
              c = runif(20)) %>%
  # transform to along format
  tidyr::gather(letter, value)

然后我们可以使用 purrr:

中的 map 和 map_dbl 而不是 apply 函数

df %>%
  group_by(letter) %>%
  nest() %>% 
  mutate(n = map_dbl(data, ~ nrow(.x)), # number of entries
         G = map(data, ~ grubbs.test(.x$value)$statistic[[1]]), # G statistic
         U = map(data, ~ grubbs.test(.x$value)$statistic[[2]]), # U statistic
         grubbs = map(data, ~ grubbs.test(.x$value)$alternative), # Alternative hypotesis
         p_grubbs = map_dbl(data, ~ grubbs.test(.x$value)$p.value)) %>% # p-value
  # Let's make the output more fancy
  mutate(G = signif(unlist(G), 3),
         U = signif(unlist(U), 3),
         grubbs = unlist(grubbs),
         p_grubbs = signif(p_grubbs, 3)) %>%
  select(-data) %>% # remove temporary column
  arrange(p_grubbs)

所需的输出是这样的：

# A tibble: 3 x 6
  letter     n     G     U grubbs                                        p_grubbs
  <chr>  <dbl> <dbl> <dbl> <chr>                                            <dbl>
1 c         20  1.68 0.843 lowest value 0.0489965472370386 is an outlier     0.84
2 a         20  1.58 0.862 lowest value 0.0174888013862073 is an outlier     1   
3 b         20  1.57 0.863 lowest value 0.0656482006888837 is an outlier     1

在 R 中同时进行多个 Grubbs 测试

Several Grubbs tests simultaneously in R

r

outliers