进行单向方差分析

Conducting a one-way ANOVA

我有一个包含网格开口测量值的数据集以及用于获取这些测量值的工具。我想完成数据的单向方差分析。这是我的代码:

df<-structure(list(MeasurementTool = c("Wedge", "Wedge", "Wedge", 
                                   "Wedge", "Wedge", "Wedge", "Wedge", "Wedge", "Wedge", "Wedge", 
                                   "Wedge", "Wedge", "Wedge", "Wedge", "Wedge", "Wedge", "Wedge", 
                                   "Wedge", "Wedge", "Wedge", "Weighted Wedge", "Weighted Wedge", 
                                   "Weighted Wedge", "Weighted Wedge", "Weighted Wedge", "Weighted Wedge", 
                                   "Weighted Wedge", "Weighted Wedge", "Weighted Wedge", "Weighted Wedge", 
                                   "Weighted Wedge", "Weighted Wedge", "Weighted Wedge", "Weighted Wedge", 
                                   "Weighted Wedge", "Weighted Wedge", "Weighted Wedge", "Weighted Wedge", 
                                   "Weighted Wedge", "Weighted Wedge", "ICES Gauge", "ICES Gauge", 
                                   "ICES Gauge", "ICES Gauge", "ICES Gauge", "ICES Gauge", "ICES Gauge", 
                                   "ICES Gauge", "ICES Gauge", "ICES Gauge", "ICES Gauge", "ICES Gauge", 
                                   "ICES Gauge", "ICES Gauge", "ICES Gauge", "ICES Gauge", "ICES Gauge", 
                                   "ICES Gauge", "ICES Gauge", "ICES Gauge"), 
               MeshOpening = c(157L, 155L, 160L, 160L, 161L, 160L, 158L, 161L, 162L, 162L, 160L, 163L, 
                                158L, 160L, 161L, 165L, 164L, 158L, 164L, 163L, 159L, 158L, 165L, 
                                164L, 159L, 160L, 158L, 159L, 160L, 163L, 159L, 160L, 158L, 158L, 
                                158L, 162L, 160L, 159L, 159L, 159L, 159L, 159L, 159L, 155L, 156L, 
                                156L, 158L, 160L, 156L, 155L, 160L, 160L, 157L, 159L, 158L, 155L, 
                                158L, 157L, 156L, 158L)), row.names = c(NA, -60L), class = "data.frame") 

df$`MeasurementTool`<- as.factor(df$`MeasurementTool`)

group_by(df, 'MeasurementTool') %>% summarise(count = n(), mean = mean('MeshOpening', na.rm = TRUE), sd = sd('MeshOpening', na.rm = TRUE))

它给我这些警告信息:

Warning messages:

1: In mean.default("MeshOpening", na.rm = TRUE) : argument is not numeric or logical: returning NA

2: In var(if (is.vector(x) || is.factor(x)) x else as.double(x), na.rm = na.rm) : NAs introduced by coercion

您被 dplyr::summarise 的工作方式误导了。它期待一个 R name (a.k.a. symbol),即字母周围没有引号:

group_by(df, 'MeasurementTool') %>% summarise(count = n(), mean = mean(MeshOpening, na.rm = TRUE), sd = sd(MeshOpening, na.rm = TRUE))
# A tibble: 1 × 4
  `"MeasurementTool"` count  mean    sd
  <chr>               <int> <dbl> <dbl>
1 MeasurementTool        60  159.  2.48

在 tidyverse 之前的日子里,我们经常像您一样通过字符值名称来引用列,但许多人似乎喜欢将列名称视为第一个 class 对象,这已成为常态在 tidyverse 中。

更好的办法是不仅解决错误的原因而且得到你真正想要的:

group_by(df, MeasurementTool) %>% summarise(count = n(), 
                                          mean = mean(MeshOpening, na.rm = TRUE), 
                                          sd = sd(MeshOpening, na.rm = TRUE))
# A tibble: 3 × 4
  MeasurementTool count  mean    sd
  <fct>           <int> <dbl> <dbl>
1 ICES Gauge         20  158.  1.73
2 Wedge              20  161.  2.56
3 Weighted Wedge     20  160.  2.06

如果第二个参数的值不被解释为与列名匹配的值,group_by 函数应该抛出一个错误或至少一个警告。