从嵌套的摘要列表（aov（））中提取值到数据框中

Question

我正在运行在单个数据框中跨多个组进行简单的单向方差分析。

此处提供数据框：https://www.dropbox.com/s/6nsjk4l1pgiwal3/cut1.csv?dl=0

>download.file('https://www.dropbox.com/s/6nsjk4l1pgiwal3/cut1.csv?raw=1', destfile = "cut1.csv", method = "auto")

> data <- read.csv("cut1.csv")
> cut1 <- data %>% mutate(Plot = as.factor(Plot), Block = as.factor(Block), Cut = as.factor(Cut)) 

> str(cut1)
'data.frame':   160 obs. of  6 variables:
 $ Plot       : Factor w/ 16 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ Block      : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 2 2 2 2 3 3 ...
 $ Treatment  : Factor w/ 4 levels "AN","C","IU",..: 4 2 3 1 1 3 4 2 3 1 ...
 $ Cut        : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
 $ Measurement: Factor w/ 10 levels "ADF","Ash","Crude_Protein",..: 5 5 5 5 5 5 5 5 5 5 ...
 $ Value      : num  956 965 961 963 955 ...

我使用了 this SO question 中的一些代码来使 aov 函数应用于 Measurement 因子的每个级别：

anova_1<- sapply(unique(as.character(cut1$Measurement)),
                 function(meas)aov(Value~Treatment+Block,cut1,subset=(Measurement==meas)),
                 simplify=FALSE,USE.NAMES=TRUE)
summary_1 <- lapply(anova_1, summary)

我可以通过 summary_1 手动查看，但理想情况下我想做的是将 Measurement 因子的每个级别的 p 值提取到一个数据框中，然后我可以对其进行过滤，以便我只看哪些<0.5。然后我会运行 TukeyHSD 这些。

summary_1 看起来像这样（仅显示前 2 个列表）：

> str(summary_1)
List of 10
 $ Dry_matter   :List of 1
  ..$ :Classes ‘anova’ and 'data.frame':    3 obs. of  5 variables:
  .. ..$ Df     : num [1:3] 3 3 9
  .. ..$ Sum Sq : num [1:3] 359 167 612
  .. ..$ Mean Sq: num [1:3] 119.8 55.5 68
  .. ..$ F value: num [1:3] 1.761 0.816 NA
  .. ..$ Pr(>F) : num [1:3] 0.224 0.517 NA
  ..- attr(*, "class")= chr [1:2] "summary.aov" "listof"
 $ Crude_Protein:List of 1
  ..$ :Classes ‘anova’ and 'data.frame':    3 obs. of  5 variables:
  .. ..$ Df     : num [1:3] 3 3 9
  .. ..$ Sum Sq : num [1:3] 306 721 1606
  .. ..$ Mean Sq: num [1:3] 102 240 178
  .. ..$ F value: num [1:3] 0.572 1.347 NA
  .. ..$ Pr(>F) : num [1:3] 0.647 0.319 NA
  ..- attr(*, "class")= chr [1:2] "summary.aov" "listof"

我可以像这样从 summary_1 中的一个列表中提取 p 值：

> summary_1$OAH[[1]][,5][1]
[1] 0.4734992

但是，我不知道如何从所有嵌套列表中提取并放入数据框中。

非常感谢您的帮助。

Answer 1

您可以将包 broom 与 dplyr 结合使用，通过 Measurement 应用 Anova，并将输出分配给 data.frame整齐的格式。

library(broom)
library(dplyr)

summaries <- cut1 %>% group_by(Measurement) %>% 
        do(tidy(aov(Value ~ Treatment + Block, data = .)))

head(summaries)
#  Measurement      term    df      sumsq    meansq statistic    p.value
#       (fctr)     (chr) (dbl)      (dbl)     (dbl)     (dbl)      (dbl)
#1         ADF Treatment     3  41.416875 13.805625  3.097871 0.07138437
#2         ADF     Block     1   8.001125  8.001125  1.795388 0.20729351
#3         ADF Residuals    11  49.021375  4.456489        NA         NA
#4         Ash Treatment     3  38.511875 12.837292  1.051787 0.40840601
#5         Ash     Block     1  34.980125 34.980125  2.865998 0.11856463
#6         Ash Residuals    11 134.257375 12.205216        NA         NA

Answer 2

这是香草 R 中的解决方案：

# you can shorten your example -- download.file not necessary
cut1 <- read.csv('https://www.dropbox.com/s/6nsjk4l1pgiwal3/cut1.csv?raw=1') %>%
  mutate(Plot = as.factor(Plot), Block = as.factor(Block), Cut = as.factor(Cut)) 

# split-apply-combine strategy
do.call(rbind, lapply(split(cut1,cut1$Measurement),
                      function(x) with(x, summary(aov(Value ~ Treatment + Block)))[[1]]
                      )
        )

returns:

                          Df  Sum Sq Mean Sq F value  Pr(>F)  
ADF.Treatment              3   41.42   13.81  6.7088 0.01133 *
ADF.Block                  3   38.50   12.83  6.2366 0.01405 *
ADF.Residuals              9   18.52    2.06                  
Ash.Treatment              3   38.51   12.84  0.9162 0.47115  
Ash.Block                  3   43.13   14.38  1.0261 0.42602  
Ash.Residuals              9  126.11   14.01                  
Crude_Protein.Treatment    3  306.42  102.14  0.5723 0.64733  
Crude_Protein.Block        3  721.42  240.47  1.3473 0.31946  
Crude_Protein.Residuals    9 1606.39  178.49                  
D.Treatment                3    9.47    3.16  4.5530 0.03331 *
D.Block                    3    7.57    2.52  3.6383 0.05751 .
D.Residuals                9    6.24    0.69                  
Dry_matter.Treatment       3  359.39  119.80  1.7609 0.22432  
Dry_matter.Block           3  166.62   55.54  0.8164 0.51656  
Dry_matter.Residuals       9  612.27   68.03                  
ME.Treatment               3    0.24    0.08  4.5530 0.03331 *
ME.Block                   3    0.19    0.06  3.6383 0.05751 .
ME.Residuals               9    0.16    0.02                  
NCGD.Treatment             3 2777.57  925.86  4.5530 0.03331 *
NCGD.Block                 3 2219.55  739.85  3.6383 0.05751 .
NCGD.Residuals             9 1830.17  203.35                  
NDF.Treatment              3  355.91  118.64  6.8809 0.01050 *
NDF.Block                  3  336.70  112.23  6.5095 0.01239 *
NDF.Residuals              9  155.17   17.24                  
OAH.Treatment              3    1.41    0.47  0.9108 0.47350  
OAH.Block                  3    1.37    0.46  0.8850 0.48488  
OAH.Residuals              9    4.65    0.52                  
Sugar.Treatment            3   86.18   28.73  5.0212 0.02577 *
Sugar.Block                3   51.64   17.21  3.0085 0.08720 .
Sugar.Residuals            9   51.49    5.72                  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

从嵌套的摘要列表（aov（））中提取值到数据框中

Extract values from nested list of summary(aov()) into a dataframe

r

list

nested-lists

anova

sapply