用于测试一个因素对多个类别的影响的线性模型

Question

我想测试一个变量是否对分组的不同物种的捕获率有影响，但我很难理解如何以简洁明了的方式做到这一点。我有一个包含大约 400 个捕获率的数据集，但物种捕获率之间存在很大差异。它看起来像这样：

 set.seed(42)  
 n <- 100

df<- data.frame(organization=rep(LETTERS[1:4], n/2),
            species=rep(c("shark", "whale", "fish", "ray", "turtle"), each=20) ,
            gear=rep(c("l", "p", "l", "p", "l", "p", "l", "p", "l", "p"), each =10),
            rate=rnorm(n))

目前我尝试过的是：

 library(broom)

 df %>% 
    group_by(species, gear) %>% 
    do(tidy(lm(rate~organization, data=.))) %>%   
    mutate(p.value=round(p.value, 3)) %>%
    filter(p.value<0.05)#filter only sig. pvals

我想知道的是有没有一种更简单优雅的方法来测试ONLY组织的效果，同时仍然对物种和装备进行分组。基本上物种和装备有很大的影响，不同的物种不能真正相互比较。所以我想知道在相同的物种和装备中，组织是否有所不同。

任何帮助将不胜感激！！

Answer 1

这是一个开始。不是完整的解决方案。在这里我们只用 species 分组。您可以先按 species 分组，然后按 gear 分组，然后将两者组合 group_by(species, gear):

library(tidyverse)
library(broom)

df %>% 
  mutate(species = as_factor(species)) %>% 
  group_by(species) %>% 
  group_split() %>% 
  map_dfr(.f = function(df) {
    lm(rate ~ organization, data = df) %>% 
      glance() %>% 
      add_column(species = unique(df$species), .before = 1)
  })

  species r.squared adj.r.squared sigma statistic p.value    df logLik   AIC   BIC deviance df.residual  nobs
  <fct>       <dbl>         <dbl> <dbl>     <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>       <int> <int>
1 shark      0.229         0.165  1.18      3.57   0.0234     3  -61.4 133.   141.     50.5          36    40
2 whale      0.192         0.124  1.03      2.84   0.0513     3  -55.6 121.   130.     37.8          36    40
3 fish       0.0980        0.0229 0.999     1.30   0.288      3  -54.6 119.   128.     35.9          36    40
4 ray        0.121         0.0481 0.783     1.66   0.194      3  -44.9  99.7  108.     22.1          36    40
5 turtle     0.0448       -0.0348 0.922     0.563  0.643      3  -51.4 113.   121.     30.6          36    40

用于测试一个因素对多个类别的影响的线性模型

Linear models to test one factor effect on multiple categories

statistics

model

r

dplyr

broom