如何运行对每个季度和行业进行横截面回归?
How to run a cross-sectional regression for each quarter and industry?
我需要 运行 对每个行业-季度组合进行一个多元回归,例如每个时间段的金融公司的回归,例如1999Q3、1999Q4、2000Q1、2000Q2……还有公用事业公司、零售公司、食品公司等
我需要 运行 回归,然后将回归中的所有系数收集到一个列表中,这样我就可以将该列表追加回原始数据框,这样我就有了相应的系数。
例如,在下面的数据集中,我想要 运行 回归 Y = x1 + x2 + x3,我尝试使用 for 循环和嵌套循环并将系数收集到矩阵中,但我似乎无法让它工作(我是 R 新手!)
我有一个如下所示的面板数据集,其中包含公司名称、行业、日历季度和一些变量:
`Company Name` Industry Quater Y x1 x2 x3
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
A & M FOOD SE Food 1985Q1 2.97 16.4 9.23 2.22
A & M FOOD SE Food 1985Q2 5.00 40.2 11.2 3.94
A & M FOOD SE Food 1985Q3 5.71 40.7 12.5 4.66
A & M FOOD SE Food 1985Q4 3.85 39.5 13.0 2.79
A & M FOOD SE Food 1986Q1 3.12 38.9 13.2 1.98
A.A. IMPORTIN Food 1985Q4 12.5 14.0 6.66 0.005
A.A. IMPORTIN Food 1986Q1 13.3 15.0 6.74 0.513
A.A. IMPORTIN Food 1986Q2 13.2 15.0 6.71 0.031
A.A. IMPORTIN Food 1986Q3 13.5 15.2 6.86 0.111
C.D. JUMPINGS Retail 1986Q4 13.1 14.6 7.46 0.241
C.D. JUMPINGS Retail 1985Q4 12.5 14.0 6.66 0.005
C.D. JUMPINGS Retail 1986Q1 13.3 15.0 6.74 0.513
C.D. JUMPINGS Retail 1986Q2 13.2 15.0 6.71 0.031
Kmart Retail 1986Q3 13.5 15.2 6.86 0.111
Kmart Retail 1986Q4 13.1 14.6 7.46 0.241
Kmart Retail 1985Q4 12.5 14.0 6.66 0.005
Kmart Retail 1986Q1 13.3 15.0 6.74 0.513
Kmart Retail 1986Q2 13.2 15.0 6.71 0.031
Kmart Retail 1986Q3 13.5 15.2 6.86 0.111
非常感谢你们,我尝试使用 plm 库中的奇怪函数到 lapply。
一个简单的基础 R 方法是 split
。 split
根据第二个参数的级别将第一个参数的 data.frame
分成 data.frame
的列表。因此,对于您的示例 data
,split(data,data$`Company Name`)
将产生 4 data.frame
的列表。
从那里,我们可以使用 lapply
将 lm
函数应用于该数据子集。因为 lm
有很多参数,所以只定义 x
的新函数(称为 lambda 函数)会更容易。
lapply(split(data,data$`Company Name`),
function(x) lm( Y ~ x1 + x2 + x3, data = x))
格式有点乱,您可以使用sapply
来简化结果。
t(sapply(split(data,data$`Company Name`),
function(x) lm( Y ~ x1 + x2 + x3, data = x)$coefficients
)
)
# (Intercept) x1 x2 x3
#A & M FOOD SE 0.5773632 0.01586041 -3.662652e-05 0.9607874
#A.A. IMPORTIN -3.6117236 0.64295788 1.067509e+00 0.1410264
#C.D. JUMPINGS 1.7123480 0.68601589 1.775447e-01 0.1964184
#Kmart 0.2591970 0.78346288 1.880233e-01 0.0525099
如果您想对两个变量执行此操作,Company Name
和 Quarter
只需提供一个 list
到 split
。
t(sapply(split(data,list(data$`Company Name`, data$Quater)),
function(x) lm( Y ~ x1 + x2 + x3, data = x)$coefficients
)
)
我无法提供输出,因为其中很多都是空的。希望你的数据集是完整的。它应该看起来像这样:
t(sapply(Filter(function(x) nrow(x) > 0, split(data,list(data$`Company Name`, data$Quater))),
function(x) lm( Y ~ x1 + x2 + x3, data = x)$coefficients
)
)
# (Intercept) x1 x2 x3
#A & M FOOD SE.1985Q1 2.97 NA NA NA
#A & M FOOD SE.1985Q2 5.00 NA NA NA
#A & M FOOD SE.1985Q3 5.71 NA NA NA
#A & M FOOD SE.1985Q4 3.85 NA NA NA
#A.A. IMPORTIN.1985Q4 12.50 NA NA NA
#C.D. JUMPINGS.1985Q4 12.50 NA NA NA
#Kmart.1985Q4 12.50 NA NA NA
#A & M FOOD SE.1986Q1 3.12 NA NA NA
#A.A. IMPORTIN.1986Q1 13.30 NA NA NA
#C.D. JUMPINGS.1986Q1 13.30 NA NA NA
#Kmart.1986Q1 13.30 NA NA NA
#A.A. IMPORTIN.1986Q2 13.20 NA NA NA
#C.D. JUMPINGS.1986Q2 13.20 NA NA NA
#Kmart.1986Q2 13.20 NA NA NA
#A.A. IMPORTIN.1986Q3 13.50 NA NA NA
#Kmart.1986Q3 13.50 NA NA NA
#C.D. JUMPINGS.1986Q4 13.10 NA NA NA
#Kmart.1986Q4 13.10 NA NA NA
数据
data <- structure(list(`Company Name` = structure(c(1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("A & M FOOD SE",
"A.A. IMPORTIN", "C.D. JUMPINGS", "Kmart"), class = "factor"),
Industry = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Food",
"Retail"), class = "factor"), Quater = structure(c(1L, 2L,
3L, 4L, 5L, 4L, 5L, 6L, 7L, 8L, 4L, 5L, 6L, 7L, 8L, 4L, 5L,
6L, 7L), .Label = c("1985Q1", "1985Q2", "1985Q3", "1985Q4",
"1986Q1", "1986Q2", "1986Q3", "1986Q4"), class = "factor"),
Y = c(2.97, 5, 5.71, 3.85, 3.12, 12.5, 13.3, 13.2, 13.5,
13.1, 12.5, 13.3, 13.2, 13.5, 13.1, 12.5, 13.3, 13.2, 13.5
), x1 = c(16.4, 40.2, 40.7, 39.5, 38.9, 14, 15, 15, 15.2,
14.6, 14, 15, 15, 15.2, 14.6, 14, 15, 15, 15.2), x2 = c(9.23,
11.2, 12.5, 13, 13.2, 6.66, 6.74, 6.71, 6.86, 7.46, 6.66,
6.74, 6.71, 6.86, 7.46, 6.66, 6.74, 6.71, 6.86), x3 = c(2.22,
3.94, 4.66, 2.79, 1.98, 0.005, 0.513, 0.031, 0.111, 0.241,
0.005, 0.513, 0.031, 0.111, 0.241, 0.005, 0.513, 0.031, 0.111
)), class = "data.frame", row.names = c(NA, -19L))
broom
的方法:
library(modelr)
library(tidyverse)
library(broom)
nested <- df %>%
group_by(Company.Name, Quater) %>%
nest()
#specify regression
country_model <- function(df) {
lm(Y ~ x1 + x2 + x3, data = df)
}
#unnest coefficients (only Intercept and one x1 here because most are NA)
nested %>%
mutate(model = map(data, country_model),
tidy = map(model, broom::tidy)) %>%
unnest(tidy)
# A tibble: 19 x 9
# Groups: Company.Name, Quater [18]
Company.Name Quater data model term estimate std.error statistic p.value
<chr> <chr> <list> <list> <chr> <dbl> <dbl> <dbl> <dbl>
1 A & M FOOD SE 1985Q1 <tibble [1 x 5]> <lm> (Intercept) 2.97e+ 0 NaN NaN NaN
2 A & M FOOD SE 1985Q2 <tibble [1 x 5]> <lm> (Intercept) 5.00e+ 0 NaN NaN NaN
3 A & M FOOD SE 1985Q3 <tibble [1 x 5]> <lm> (Intercept) 5.71e+ 0 NaN NaN NaN
4 A & M FOOD SE 1985Q4 <tibble [1 x 5]> <lm> (Intercept) 3.85e+ 0 NaN NaN NaN
5 A & M FOOD SE 1986Q1 <tibble [1 x 5]> <lm> (Intercept) 3.12e+ 0 NaN NaN NaN
6 A.A. IMPORTIN 1985Q4 <tibble [1 x 5]> <lm> (Intercept) 1.25e+ 1 NaN NaN NaN
7 A.A. IMPORTIN 1986Q1 <tibble [1 x 5]> <lm> (Intercept) 1.33e+ 1 NaN NaN NaN
8 A.A. IMPORTIN 1986Q2 <tibble [1 x 5]> <lm> (Intercept) 1.32e+ 1 NaN NaN NaN
9 A.A. IMPORTIN 1986Q3 <tibble [1 x 5]> <lm> (Intercept) 1.35e+ 1 NaN NaN NaN
10 C.D. JUMPINGS 1986Q4 <tibble [1 x 5]> <lm> (Intercept) 1.31e+ 1 NaN NaN NaN
11 C.D. JUMPINGS 1985Q4 <tibble [1 x 5]> <lm> (Intercept) 1.25e+ 1 NaN NaN NaN
12 C.D. JUMPINGS 1986Q1 <tibble [1 x 5]> <lm> (Intercept) 1.33e+ 1 NaN NaN NaN
13 C.D. JUMPINGS 1986Q2 <tibble [1 x 5]> <lm> (Intercept) 1.32e+ 1 NaN NaN NaN
14 Kmart 1986Q3 <tibble [2 x 5]> <lm> (Intercept) 1.35e+ 1 NaN NaN NaN
15 Kmart 1986Q3 <tibble [2 x 5]> <lm> x1 3.81e-16 NaN NaN NaN
16 Kmart 1986Q4 <tibble [1 x 5]> <lm> (Intercept) 1.31e+ 1 NaN NaN NaN
17 Kmart 1985Q4 <tibble [1 x 5]> <lm> (Intercept) 1.25e+ 1 NaN NaN NaN
18 Kmart 1986Q1 <tibble [1 x 5]> <lm> (Intercept) 1.33e+ 1 NaN NaN NaN
19 Kmart 1986Q2 <tibble [1 x 5]> <lm> (Intercept) 1.32e+ 1 NaN NaN NaN
我需要 运行 对每个行业-季度组合进行一个多元回归,例如每个时间段的金融公司的回归,例如1999Q3、1999Q4、2000Q1、2000Q2……还有公用事业公司、零售公司、食品公司等
我需要 运行 回归,然后将回归中的所有系数收集到一个列表中,这样我就可以将该列表追加回原始数据框,这样我就有了相应的系数。
例如,在下面的数据集中,我想要 运行 回归 Y = x1 + x2 + x3,我尝试使用 for 循环和嵌套循环并将系数收集到矩阵中,但我似乎无法让它工作(我是 R 新手!)
我有一个如下所示的面板数据集,其中包含公司名称、行业、日历季度和一些变量:
`Company Name` Industry Quater Y x1 x2 x3
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
A & M FOOD SE Food 1985Q1 2.97 16.4 9.23 2.22
A & M FOOD SE Food 1985Q2 5.00 40.2 11.2 3.94
A & M FOOD SE Food 1985Q3 5.71 40.7 12.5 4.66
A & M FOOD SE Food 1985Q4 3.85 39.5 13.0 2.79
A & M FOOD SE Food 1986Q1 3.12 38.9 13.2 1.98
A.A. IMPORTIN Food 1985Q4 12.5 14.0 6.66 0.005
A.A. IMPORTIN Food 1986Q1 13.3 15.0 6.74 0.513
A.A. IMPORTIN Food 1986Q2 13.2 15.0 6.71 0.031
A.A. IMPORTIN Food 1986Q3 13.5 15.2 6.86 0.111
C.D. JUMPINGS Retail 1986Q4 13.1 14.6 7.46 0.241
C.D. JUMPINGS Retail 1985Q4 12.5 14.0 6.66 0.005
C.D. JUMPINGS Retail 1986Q1 13.3 15.0 6.74 0.513
C.D. JUMPINGS Retail 1986Q2 13.2 15.0 6.71 0.031
Kmart Retail 1986Q3 13.5 15.2 6.86 0.111
Kmart Retail 1986Q4 13.1 14.6 7.46 0.241
Kmart Retail 1985Q4 12.5 14.0 6.66 0.005
Kmart Retail 1986Q1 13.3 15.0 6.74 0.513
Kmart Retail 1986Q2 13.2 15.0 6.71 0.031
Kmart Retail 1986Q3 13.5 15.2 6.86 0.111
非常感谢你们,我尝试使用 plm 库中的奇怪函数到 lapply。
一个简单的基础 R 方法是 split
。 split
根据第二个参数的级别将第一个参数的 data.frame
分成 data.frame
的列表。因此,对于您的示例 data
,split(data,data$`Company Name`)
将产生 4 data.frame
的列表。
从那里,我们可以使用 lapply
将 lm
函数应用于该数据子集。因为 lm
有很多参数,所以只定义 x
的新函数(称为 lambda 函数)会更容易。
lapply(split(data,data$`Company Name`),
function(x) lm( Y ~ x1 + x2 + x3, data = x))
格式有点乱,您可以使用sapply
来简化结果。
t(sapply(split(data,data$`Company Name`),
function(x) lm( Y ~ x1 + x2 + x3, data = x)$coefficients
)
)
# (Intercept) x1 x2 x3
#A & M FOOD SE 0.5773632 0.01586041 -3.662652e-05 0.9607874
#A.A. IMPORTIN -3.6117236 0.64295788 1.067509e+00 0.1410264
#C.D. JUMPINGS 1.7123480 0.68601589 1.775447e-01 0.1964184
#Kmart 0.2591970 0.78346288 1.880233e-01 0.0525099
如果您想对两个变量执行此操作,Company Name
和 Quarter
只需提供一个 list
到 split
。
t(sapply(split(data,list(data$`Company Name`, data$Quater)),
function(x) lm( Y ~ x1 + x2 + x3, data = x)$coefficients
)
)
我无法提供输出,因为其中很多都是空的。希望你的数据集是完整的。它应该看起来像这样:
t(sapply(Filter(function(x) nrow(x) > 0, split(data,list(data$`Company Name`, data$Quater))),
function(x) lm( Y ~ x1 + x2 + x3, data = x)$coefficients
)
)
# (Intercept) x1 x2 x3
#A & M FOOD SE.1985Q1 2.97 NA NA NA
#A & M FOOD SE.1985Q2 5.00 NA NA NA
#A & M FOOD SE.1985Q3 5.71 NA NA NA
#A & M FOOD SE.1985Q4 3.85 NA NA NA
#A.A. IMPORTIN.1985Q4 12.50 NA NA NA
#C.D. JUMPINGS.1985Q4 12.50 NA NA NA
#Kmart.1985Q4 12.50 NA NA NA
#A & M FOOD SE.1986Q1 3.12 NA NA NA
#A.A. IMPORTIN.1986Q1 13.30 NA NA NA
#C.D. JUMPINGS.1986Q1 13.30 NA NA NA
#Kmart.1986Q1 13.30 NA NA NA
#A.A. IMPORTIN.1986Q2 13.20 NA NA NA
#C.D. JUMPINGS.1986Q2 13.20 NA NA NA
#Kmart.1986Q2 13.20 NA NA NA
#A.A. IMPORTIN.1986Q3 13.50 NA NA NA
#Kmart.1986Q3 13.50 NA NA NA
#C.D. JUMPINGS.1986Q4 13.10 NA NA NA
#Kmart.1986Q4 13.10 NA NA NA
数据
data <- structure(list(`Company Name` = structure(c(1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("A & M FOOD SE",
"A.A. IMPORTIN", "C.D. JUMPINGS", "Kmart"), class = "factor"),
Industry = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Food",
"Retail"), class = "factor"), Quater = structure(c(1L, 2L,
3L, 4L, 5L, 4L, 5L, 6L, 7L, 8L, 4L, 5L, 6L, 7L, 8L, 4L, 5L,
6L, 7L), .Label = c("1985Q1", "1985Q2", "1985Q3", "1985Q4",
"1986Q1", "1986Q2", "1986Q3", "1986Q4"), class = "factor"),
Y = c(2.97, 5, 5.71, 3.85, 3.12, 12.5, 13.3, 13.2, 13.5,
13.1, 12.5, 13.3, 13.2, 13.5, 13.1, 12.5, 13.3, 13.2, 13.5
), x1 = c(16.4, 40.2, 40.7, 39.5, 38.9, 14, 15, 15, 15.2,
14.6, 14, 15, 15, 15.2, 14.6, 14, 15, 15, 15.2), x2 = c(9.23,
11.2, 12.5, 13, 13.2, 6.66, 6.74, 6.71, 6.86, 7.46, 6.66,
6.74, 6.71, 6.86, 7.46, 6.66, 6.74, 6.71, 6.86), x3 = c(2.22,
3.94, 4.66, 2.79, 1.98, 0.005, 0.513, 0.031, 0.111, 0.241,
0.005, 0.513, 0.031, 0.111, 0.241, 0.005, 0.513, 0.031, 0.111
)), class = "data.frame", row.names = c(NA, -19L))
broom
的方法:
library(modelr)
library(tidyverse)
library(broom)
nested <- df %>%
group_by(Company.Name, Quater) %>%
nest()
#specify regression
country_model <- function(df) {
lm(Y ~ x1 + x2 + x3, data = df)
}
#unnest coefficients (only Intercept and one x1 here because most are NA)
nested %>%
mutate(model = map(data, country_model),
tidy = map(model, broom::tidy)) %>%
unnest(tidy)
# A tibble: 19 x 9
# Groups: Company.Name, Quater [18]
Company.Name Quater data model term estimate std.error statistic p.value
<chr> <chr> <list> <list> <chr> <dbl> <dbl> <dbl> <dbl>
1 A & M FOOD SE 1985Q1 <tibble [1 x 5]> <lm> (Intercept) 2.97e+ 0 NaN NaN NaN
2 A & M FOOD SE 1985Q2 <tibble [1 x 5]> <lm> (Intercept) 5.00e+ 0 NaN NaN NaN
3 A & M FOOD SE 1985Q3 <tibble [1 x 5]> <lm> (Intercept) 5.71e+ 0 NaN NaN NaN
4 A & M FOOD SE 1985Q4 <tibble [1 x 5]> <lm> (Intercept) 3.85e+ 0 NaN NaN NaN
5 A & M FOOD SE 1986Q1 <tibble [1 x 5]> <lm> (Intercept) 3.12e+ 0 NaN NaN NaN
6 A.A. IMPORTIN 1985Q4 <tibble [1 x 5]> <lm> (Intercept) 1.25e+ 1 NaN NaN NaN
7 A.A. IMPORTIN 1986Q1 <tibble [1 x 5]> <lm> (Intercept) 1.33e+ 1 NaN NaN NaN
8 A.A. IMPORTIN 1986Q2 <tibble [1 x 5]> <lm> (Intercept) 1.32e+ 1 NaN NaN NaN
9 A.A. IMPORTIN 1986Q3 <tibble [1 x 5]> <lm> (Intercept) 1.35e+ 1 NaN NaN NaN
10 C.D. JUMPINGS 1986Q4 <tibble [1 x 5]> <lm> (Intercept) 1.31e+ 1 NaN NaN NaN
11 C.D. JUMPINGS 1985Q4 <tibble [1 x 5]> <lm> (Intercept) 1.25e+ 1 NaN NaN NaN
12 C.D. JUMPINGS 1986Q1 <tibble [1 x 5]> <lm> (Intercept) 1.33e+ 1 NaN NaN NaN
13 C.D. JUMPINGS 1986Q2 <tibble [1 x 5]> <lm> (Intercept) 1.32e+ 1 NaN NaN NaN
14 Kmart 1986Q3 <tibble [2 x 5]> <lm> (Intercept) 1.35e+ 1 NaN NaN NaN
15 Kmart 1986Q3 <tibble [2 x 5]> <lm> x1 3.81e-16 NaN NaN NaN
16 Kmart 1986Q4 <tibble [1 x 5]> <lm> (Intercept) 1.31e+ 1 NaN NaN NaN
17 Kmart 1985Q4 <tibble [1 x 5]> <lm> (Intercept) 1.25e+ 1 NaN NaN NaN
18 Kmart 1986Q1 <tibble [1 x 5]> <lm> (Intercept) 1.33e+ 1 NaN NaN NaN
19 Kmart 1986Q2 <tibble [1 x 5]> <lm> (Intercept) 1.32e+ 1 NaN NaN NaN