R中分离组的回归分析
Regression analysis with separateing group in R
在我的数据集中,有两个组变量shop and art
这里的数据示例
read.csv(reg.csv)
structure(list(shop = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L), .Label = c("a", "c"), class = "factor"), art = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("b", "d"), class = "factor"),
Y = c(177L, 122L, 175L, 140L, 201L, 202L, 279L, 253L, 236L,
137L, 166L, 241L, 195L, 221L, 238L, 203L, 254L, 219L, 101L,
157L, 188L, 219L, 267L, 126L, 291L, 239L, 230L), x1 = c(1L,
0L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L,
0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 0L, 1L), x2 = c(0L, 1L,
1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 0L,
1L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 1L), x3 = c(0L, 0L, 0L,
1L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
1L, 0L, 1L, 1L, 0L, 1L, 0L, 1L, 0L), x4 = c(0L, 0L, 1L, 1L,
0L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 1L,
0L, 0L, 0L, 1L, 1L, 0L, 1L, 1L), x5 = c(0L, 0L, 1L, 1L, 0L,
0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 1L,
1L, 0L, 0L, 1L, 1L, 1L, 0L), x6 = c(0L, 1L, 0L, 0L, 1L, 1L,
0L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 0L,
1L, 1L, 1L, 1L, 0L, 1L), x7 = c(1L, 1L, 0L, 0L, 1L, 0L, 0L,
0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L,
0L, 1L, 1L, 1L, 0L), x8 = c(0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L,
1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 1L,
0L, 1L, 0L, 1L), x9 = c(1L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 1L,
0L, 0L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 0L,
1L, 1L, 0L)), .Names = c("shop", "art", "Y", "x1", "x2",
"x3", "x4", "x5", "x6", "x7", "x8", "x9"), class = "data.frame", row.names = c(NA,
-27L))
我需要对所有组分别进行回归分析。
公式很简单
mymodel=lm(y~.,data=reg)
即我必须分别对 a+b
组和 c+d
组进行分析。
在此示例中,我们只有 2 个组(a+b 和 c+d)
其中 a,c 表示商店名称,b,d 表示供应商代码名称。
如何按组分别进行回归分析,因为在真实数据中,有几十个组,在数据集上手动划分是不可能的。
这是一种相对常见的分析模式,称为拆分 - 应用 - 组合,使用 R 很容易执行:
library(tidyverse)
library(broom)
为 lm 创建函数:
my_lm <- function(df) {
lm(Y ~ ., data = df)
}
运行 嵌套数据组的模型:
df %>%
group_by(art, shop) %>%
nest() %>%
mutate(fit = map(data, my_lm),
tidy = map(fit, tidy)) %>%
select(-fit, - data) %>%
unnest()
首先,您按所需变量对变量进行分组,将 lm 模型拟合到组中,使用 tidy 提取系数,删除不需要的列,然后取消嵌套。结果是:
#output
art shop term estimate std.error statistic p.value
<fctr> <fctr> <chr> <dbl> <dbl> <dbl> <dbl>
1 b a (Intercept) 31.0 269 0.115 0.927
2 b a x1 109 153 0.714 0.605
3 b a x2 - 23.0 223 -0.103 0.934
4 b a x3 - 15.0 185 -0.0810 0.949
5 b a x4 31.0 333 0.0931 0.941
6 b a x5 81.0 457 0.177 0.888
7 b a x6 77.0 162 0.475 0.718
8 b a x7 - 17.0 310 -0.0548 0.965
9 b a x8 - 15.0 214 -0.0700 0.956
10 b a x9 54.0 349 0.155 0.902
11 d c (Intercept) 199 98.8 2.01 0.0907
12 d c x1 - 15.7 60.8 -0.259 0.804
13 d c x2 5.98 48.8 0.123 0.906
14 d c x3 7.34 57.8 0.127 0.903
15 d c x4 - 20.1 53.8 -0.373 0.722
16 d c x5 - 43.2 41.8 -1.03 0.342
17 d c x6 1.93 34.5 0.0560 0.957
18 d c x7 31.9 40.5 0.787 0.461
19 d c x8 36.0 45.9 0.786 0.462
20 d c x9 10.7 49.7 0.215 0.837
有许多教程使用与我在评论中发布的方法相同或相似的方法。
在我的数据集中,有两个组变量shop and art
这里的数据示例
read.csv(reg.csv)
structure(list(shop = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L), .Label = c("a", "c"), class = "factor"), art = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("b", "d"), class = "factor"),
Y = c(177L, 122L, 175L, 140L, 201L, 202L, 279L, 253L, 236L,
137L, 166L, 241L, 195L, 221L, 238L, 203L, 254L, 219L, 101L,
157L, 188L, 219L, 267L, 126L, 291L, 239L, 230L), x1 = c(1L,
0L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L,
0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 0L, 1L), x2 = c(0L, 1L,
1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 0L,
1L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 1L), x3 = c(0L, 0L, 0L,
1L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
1L, 0L, 1L, 1L, 0L, 1L, 0L, 1L, 0L), x4 = c(0L, 0L, 1L, 1L,
0L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 1L,
0L, 0L, 0L, 1L, 1L, 0L, 1L, 1L), x5 = c(0L, 0L, 1L, 1L, 0L,
0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 1L,
1L, 0L, 0L, 1L, 1L, 1L, 0L), x6 = c(0L, 1L, 0L, 0L, 1L, 1L,
0L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 0L,
1L, 1L, 1L, 1L, 0L, 1L), x7 = c(1L, 1L, 0L, 0L, 1L, 0L, 0L,
0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L,
0L, 1L, 1L, 1L, 0L), x8 = c(0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L,
1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 1L,
0L, 1L, 0L, 1L), x9 = c(1L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 1L,
0L, 0L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 0L,
1L, 1L, 0L)), .Names = c("shop", "art", "Y", "x1", "x2",
"x3", "x4", "x5", "x6", "x7", "x8", "x9"), class = "data.frame", row.names = c(NA,
-27L))
我需要对所有组分别进行回归分析。 公式很简单
mymodel=lm(y~.,data=reg)
即我必须分别对 a+b
组和 c+d
组进行分析。
在此示例中,我们只有 2 个组(a+b 和 c+d)
其中 a,c 表示商店名称,b,d 表示供应商代码名称。
如何按组分别进行回归分析,因为在真实数据中,有几十个组,在数据集上手动划分是不可能的。
这是一种相对常见的分析模式,称为拆分 - 应用 - 组合,使用 R 很容易执行:
library(tidyverse)
library(broom)
为 lm 创建函数:
my_lm <- function(df) {
lm(Y ~ ., data = df)
}
运行 嵌套数据组的模型:
df %>%
group_by(art, shop) %>%
nest() %>%
mutate(fit = map(data, my_lm),
tidy = map(fit, tidy)) %>%
select(-fit, - data) %>%
unnest()
首先,您按所需变量对变量进行分组,将 lm 模型拟合到组中,使用 tidy 提取系数,删除不需要的列,然后取消嵌套。结果是:
#output
art shop term estimate std.error statistic p.value
<fctr> <fctr> <chr> <dbl> <dbl> <dbl> <dbl>
1 b a (Intercept) 31.0 269 0.115 0.927
2 b a x1 109 153 0.714 0.605
3 b a x2 - 23.0 223 -0.103 0.934
4 b a x3 - 15.0 185 -0.0810 0.949
5 b a x4 31.0 333 0.0931 0.941
6 b a x5 81.0 457 0.177 0.888
7 b a x6 77.0 162 0.475 0.718
8 b a x7 - 17.0 310 -0.0548 0.965
9 b a x8 - 15.0 214 -0.0700 0.956
10 b a x9 54.0 349 0.155 0.902
11 d c (Intercept) 199 98.8 2.01 0.0907
12 d c x1 - 15.7 60.8 -0.259 0.804
13 d c x2 5.98 48.8 0.123 0.906
14 d c x3 7.34 57.8 0.127 0.903
15 d c x4 - 20.1 53.8 -0.373 0.722
16 d c x5 - 43.2 41.8 -1.03 0.342
17 d c x6 1.93 34.5 0.0560 0.957
18 d c x7 31.9 40.5 0.787 0.461
19 d c x8 36.0 45.9 0.786 0.462
20 d c x9 10.7 49.7 0.215 0.837
有许多教程使用与我在评论中发布的方法相同或相似的方法。