"anova_test" 函数错误(0 (non_NA) 例)和双向重复方差分析的线性组合

"anova_test" function error (0 (non_NA) cases) and linear combination for two-way repeated anova

我正在尝试 运行 使用 github 上的 anova_test function in the rstatix package. I am roughly following the tutorial found here. My data consists of sevaral ant colonies ("Colony"), each split into 3 treatments ("Size"). I collected data ("g") over 8 timepoints ("Time"). I have uploaded a subset of my data 在 R 中进行双向重复测量方差分析,但这里有一个简短的总结:

 # A tibble: 24 x 6
   Species Colony Fragment Size  Time      g
   <fct>   <fct>  <fct>    <fct> <fct> <dbl>
 1 obs     5      5L       L     1     0.565
 2 obs     2      2L       L     2     0.002
 3 obs     8      8L       L     3     0.699
 4 obs     12     12L      L     4     0.257
 5 obs     12     12L      L     5     0.131
 6 obs     3      3L       L     6     0.014
 7 obs     10     10L      L     7     0.15 
 8 obs     12     12L      L     8     0.054
 9 obs     10     10M      M     1     0.448
10 obs     8      8M       M     2     0.135
# ... with 14 more rows

我已经尝试 运行使用以下代码以三种不同的方式使用双向重复测量方差分析:

aov <- df %>% anova_test(g ~ Size*Time + Error(Colony/(Size*Time)))
aov <- df %>% anova_test(dv=g, wid = Colony, within= c(Size,Time))
aov <- anova_test(data = df, dv=g, wid=Colony, within=c(Size, Time))

他们各自输出以下错误:

Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
  0 (non-NA) cases

我在两个样本数据集上尝试了相同的代码,这两个样本数据集的格式与我的数据集相似,并且该函数运行良好(并且每种方法都输出相同的结果)。以下是示例数据集的摘要以供参考:

# A tibble: 6 x 4
  id    treatment time  score
  <fct> <fct>     <fct> <dbl>
1 7     ctr       t1       92
2 6     ctr       t2       65
3 12    ctr       t3       62
4 6     Diet      t1       76
5 9     Diet      t2       94
6 7     Diet      t3       87



# A tibble: 6 x 4
        len supp   dose    id
      <dbl> <fct> <dbl> <int>
    1  21.5 OJ      0.5     2
    2  14.5 OJ      1       9
    3  22.4 OJ      2       3
    4   4.2 VC      0.5     1
    5  17.3 VC      1       4
    6  29.5 VC      2      10

我已验证我的数据没有任何 NA 值 any(is.na(df)) returns FALSE。

我遇到了 similar question 并且一位有用的发帖者建议此错误可能是由于线性组合而不是 NA 值引起的。我决定使用 lm(g ~ Colony+Time:Size, data=df) 检查我的数据,事实上,我确实有一个线性组合:

Call:
lm(formula = g ~ Colony + Time:Size, data = df)

Coefficients:
(Intercept)      Colony1      Colony2      Colony3      Colony4      Colony5  Time1:SizeL  Time2:SizeL  Time3:SizeL  
   0.044167    -0.118549    -0.108424     0.076868     0.073243     0.034368     0.213000     0.351167     0.199833  
Time4:SizeL  Time5:SizeL  Time6:SizeL  Time7:SizeL  Time8:SizeL  Time1:SizeM  Time2:SizeM  Time3:SizeM  Time4:SizeM  
   0.060667     0.071333     0.005000     0.017000    -0.029167     0.239667     0.216333     0.174667     0.050500  
Time5:SizeM  Time6:SizeM  Time7:SizeM  Time8:SizeM  Time1:SizeS  Time2:SizeS  Time3:SizeS  Time4:SizeS  Time5:SizeS  
   0.069500     0.033167     0.011500    -0.003667    -0.015500     0.081167     0.020000     0.042500     0.026333  
Time6:SizeS  Time7:SizeS  Time8:SizeS  
  -0.014333    -0.000500           NA  

但是,我不明白为什么。 Time8:SizeS 类别与所有其他 Time:Size 组合本质上相同。如果有人能解释为什么我可能 运行 会陷入这个错误,或者有一个解决方案来解决我如何执行双向重复测量方差分析(有或没有 anova_test) 在我的数据上,我将不胜感激!

提前致谢!

我需要再次阅读 rstatix::anova_test 的代码,但您的设计没问题,它是平衡的,导致所有问题的是额外的列。我怀疑由于列的原因,旋转在某个地方变得混乱:

library(rstatix)
library(dplyr)

df=read.csv("https://raw.githubusercontent.com/mwest9/sample_data/master/test_repeat_anova.csv")

df$Colony = factor(df$Colony)
df$Time = factor(df$Time)

df %>% select(g,Size,Time,Colony) %>%
anova_test(g ~ Size*Time + Error(Colony/(Size*Time)))

ANOVA Table (type III tests)

     Effect DFn DFd     F       p p<.05   ges
1      Size   2  10 4.098 0.05000       0.075
2      Time   7  35 5.428 0.00028     * 0.209
3 Size:Time  14  70 1.595 0.10200       0.099

注意它只报告方差分析而不报告其他球形测试:

Mauchly’s Test for Sphericity: If any within-Ss variables with more than 2 levels are present, a data frame containing the results of Mauchly’s test for Sphericity. Only reported for effects that have more than 2 levels because sphericity necessarily holds for effects with only 2 levels. • Sphericity Corrections: If any within-Ss variables are present, a data frame containing the Greenhouse-Geisser and Huynh-Feldt epsilon values, and corresponding corrected p-values.