R中分离组的回归分析

Regression analysis with separateing group in R

在我的数据集中,有两个组变量shop and art 这里的数据示例

read.csv(reg.csv)
structure(list(shop = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L), .Label = c("a", "c"), class = "factor"), art = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("b", "d"), class = "factor"), 
    Y = c(177L, 122L, 175L, 140L, 201L, 202L, 279L, 253L, 236L, 
    137L, 166L, 241L, 195L, 221L, 238L, 203L, 254L, 219L, 101L, 
    157L, 188L, 219L, 267L, 126L, 291L, 239L, 230L), x1 = c(1L, 
    0L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 
    0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 0L, 1L), x2 = c(0L, 1L, 
    1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 
    1L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 1L), x3 = c(0L, 0L, 0L, 
    1L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 
    1L, 0L, 1L, 1L, 0L, 1L, 0L, 1L, 0L), x4 = c(0L, 0L, 1L, 1L, 
    0L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 
    0L, 0L, 0L, 1L, 1L, 0L, 1L, 1L), x5 = c(0L, 0L, 1L, 1L, 0L, 
    0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 
    1L, 0L, 0L, 1L, 1L, 1L, 0L), x6 = c(0L, 1L, 0L, 0L, 1L, 1L, 
    0L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 0L, 
    1L, 1L, 1L, 1L, 0L, 1L), x7 = c(1L, 1L, 0L, 0L, 1L, 0L, 0L, 
    0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 
    0L, 1L, 1L, 1L, 0L), x8 = c(0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 
    1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 
    0L, 1L, 0L, 1L), x9 = c(1L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 1L, 
    0L, 0L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 0L, 
    1L, 1L, 0L)), .Names = c("shop", "art", "Y", "x1", "x2", 
"x3", "x4", "x5", "x6", "x7", "x8", "x9"), class = "data.frame", row.names = c(NA, 
-27L))

我需要对所有组分别进行回归分析。 公式很简单

mymodel=lm(y~.,data=reg)

即我必须分别对 a+b 组和 c+d 组进行分析。 在此示例中,我们只有 2 个组(a+b 和 c+d) 其中 a,c 表示商店名称,b,d 表示供应商代码名称。

如何按组分别进行回归分析,因为在真实数据中,有几十个组,在数据集上手动划分是不可能的。

这是一种相对常见的分析模式,称为拆分 - 应用 - 组合,使用 R 很容易执行:

library(tidyverse)
library(broom)

为 lm 创建函数:

my_lm <- function(df) {
  lm(Y ~ ., data = df)
}

运行 嵌套数据组的模型:

df %>% 
  group_by(art, shop) %>% 
  nest() %>%
  mutate(fit = map(data, my_lm),
         tidy = map(fit, tidy)) %>%
  select(-fit, - data) %>%
  unnest()

首先,您按所需变量对变量进行分组,将 lm 模型拟合到组中,使用 tidy 提取系数,删除不需要的列,然后取消嵌套。结果是:

#output
  art    shop   term        estimate std.error statistic p.value
   <fctr> <fctr> <chr>          <dbl>     <dbl>     <dbl>   <dbl>
 1 b      a      (Intercept)    31.0      269      0.115   0.927 
 2 b      a      x1            109        153      0.714   0.605 
 3 b      a      x2           - 23.0      223     -0.103   0.934 
 4 b      a      x3           - 15.0      185     -0.0810  0.949 
 5 b      a      x4             31.0      333      0.0931  0.941 
 6 b      a      x5             81.0      457      0.177   0.888 
 7 b      a      x6             77.0      162      0.475   0.718 
 8 b      a      x7           - 17.0      310     -0.0548  0.965 
 9 b      a      x8           - 15.0      214     -0.0700  0.956 
10 b      a      x9             54.0      349      0.155   0.902 
11 d      c      (Intercept)   199         98.8    2.01    0.0907
12 d      c      x1           - 15.7       60.8   -0.259   0.804 
13 d      c      x2              5.98      48.8    0.123   0.906 
14 d      c      x3              7.34      57.8    0.127   0.903 
15 d      c      x4           - 20.1       53.8   -0.373   0.722 
16 d      c      x5           - 43.2       41.8   -1.03    0.342 
17 d      c      x6              1.93      34.5    0.0560  0.957 
18 d      c      x7             31.9       40.5    0.787   0.461 
19 d      c      x8             36.0       45.9    0.786   0.462 
20 d      c      x9             10.7       49.7    0.215   0.837 

有许多教程使用与我在评论中发布的方法相同或相似的方法。