如何运行对每个季度和行业进行横截面回归？

Question

我需要运行对每个行业-季度组合进行一个多元回归，例如每个时间段的金融公司的回归，例如1999Q3、1999Q4、2000Q1、2000Q2……还有公用事业公司、零售公司、食品公司等

我需要运行回归，然后将回归中的所有系数收集到一个列表中，这样我就可以将该列表追加回原始数据框，这样我就有了相应的系数。

例如，在下面的数据集中，我想要运行回归 Y = x1 + x2 + x3，我尝试使用 for 循环和嵌套循环并将系数收集到矩阵中，但我似乎无法让它工作（我是 R 新手！）

我有一个如下所示的面板数据集，其中包含公司名称、行业、日历季度和一些变量：

              `Company Name`  Industry  Quater           Y                x1               x2               x3
               <chr>          <chr>     <chr>            <dbl>            <dbl>            <dbl>            <dbl>            
              A & M FOOD SE  Food       1985Q1           2.97             16.4             9.23             2.22              
              A & M FOOD SE  Food       1985Q2           5.00             40.2             11.2             3.94              
              A & M FOOD SE  Food       1985Q3           5.71             40.7             12.5             4.66              
              A & M FOOD SE  Food       1985Q4           3.85             39.5             13.0             2.79              
              A & M FOOD SE  Food       1986Q1           3.12             38.9             13.2             1.98              
              A.A. IMPORTIN  Food       1985Q4           12.5             14.0             6.66             0.005             
              A.A. IMPORTIN  Food       1986Q1           13.3             15.0             6.74             0.513              
              A.A. IMPORTIN  Food       1986Q2           13.2             15.0             6.71             0.031             
              A.A. IMPORTIN  Food       1986Q3           13.5             15.2             6.86             0.111             
              C.D. JUMPINGS  Retail     1986Q4           13.1             14.6             7.46             0.241
              C.D. JUMPINGS  Retail     1985Q4           12.5             14.0             6.66             0.005             
              C.D. JUMPINGS  Retail     1986Q1           13.3             15.0             6.74             0.513              
              C.D. JUMPINGS  Retail     1986Q2           13.2             15.0             6.71             0.031             
              Kmart          Retail     1986Q3           13.5             15.2             6.86             0.111
              Kmart          Retail     1986Q4           13.1             14.6             7.46             0.241
              Kmart          Retail     1985Q4           12.5             14.0             6.66             0.005             
              Kmart          Retail     1986Q1           13.3             15.0             6.74             0.513              
              Kmart          Retail     1986Q2           13.2             15.0             6.71             0.031             
              Kmart          Retail     1986Q3           13.5             15.2             6.86             0.111

非常感谢你们，我尝试使用 plm 库中的奇怪函数到 lapply。

Answer 1

一个简单的基础 R 方法是 split。 split 根据第二个参数的级别将第一个参数的 data.frame 分成 data.frame 的列表。因此，对于您的示例 data，split(data,data$`Company Name`) 将产生 4 data.frame 的列表。

从那里，我们可以使用 lapply 将 lm 函数应用于该数据子集。因为 lm 有很多参数，所以只定义 x 的新函数（称为 lambda 函数）会更容易。

lapply(split(data,data$`Company Name`),
       function(x) lm( Y ~ x1 + x2 + x3, data = x))

格式有点乱，您可以使用sapply来简化结果。

t(sapply(split(data,data$`Company Name`),
         function(x) lm( Y ~ x1 + x2 + x3, data = x)$coefficients
         )
  )
#              (Intercept)         x1            x2        x3
#A & M FOOD SE   0.5773632 0.01586041 -3.662652e-05 0.9607874
#A.A. IMPORTIN  -3.6117236 0.64295788  1.067509e+00 0.1410264
#C.D. JUMPINGS   1.7123480 0.68601589  1.775447e-01 0.1964184
#Kmart           0.2591970 0.78346288  1.880233e-01 0.0525099

如果您想对两个变量执行此操作，Company Name 和 Quarter 只需提供一个 list 到 split。

t(sapply(split(data,list(data$`Company Name`, data$Quater)),
         function(x) lm( Y ~ x1 + x2 + x3, data = x)$coefficients
         )
  )

我无法提供输出，因为其中很多都是空的。希望你的数据集是完整的。它应该看起来像这样：

t(sapply(Filter(function(x) nrow(x) > 0, split(data,list(data$`Company Name`, data$Quater))),
          function(x) lm( Y ~ x1 + x2 + x3, data = x)$coefficients
          )
   )
#                     (Intercept) x1 x2 x3
#A & M FOOD SE.1985Q1        2.97 NA NA NA
#A & M FOOD SE.1985Q2        5.00 NA NA NA
#A & M FOOD SE.1985Q3        5.71 NA NA NA
#A & M FOOD SE.1985Q4        3.85 NA NA NA
#A.A. IMPORTIN.1985Q4       12.50 NA NA NA
#C.D. JUMPINGS.1985Q4       12.50 NA NA NA
#Kmart.1985Q4               12.50 NA NA NA
#A & M FOOD SE.1986Q1        3.12 NA NA NA
#A.A. IMPORTIN.1986Q1       13.30 NA NA NA
#C.D. JUMPINGS.1986Q1       13.30 NA NA NA
#Kmart.1986Q1               13.30 NA NA NA
#A.A. IMPORTIN.1986Q2       13.20 NA NA NA
#C.D. JUMPINGS.1986Q2       13.20 NA NA NA
#Kmart.1986Q2               13.20 NA NA NA
#A.A. IMPORTIN.1986Q3       13.50 NA NA NA
#Kmart.1986Q3               13.50 NA NA NA
#C.D. JUMPINGS.1986Q4       13.10 NA NA NA
#Kmart.1986Q4               13.10 NA NA NA

数据

data <- structure(list(`Company Name` = structure(c(1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("A & M FOOD SE", 
"A.A. IMPORTIN", "C.D. JUMPINGS", "Kmart"), class = "factor"), 
    Industry = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Food", 
    "Retail"), class = "factor"), Quater = structure(c(1L, 2L, 
    3L, 4L, 5L, 4L, 5L, 6L, 7L, 8L, 4L, 5L, 6L, 7L, 8L, 4L, 5L, 
    6L, 7L), .Label = c("1985Q1", "1985Q2", "1985Q3", "1985Q4", 
    "1986Q1", "1986Q2", "1986Q3", "1986Q4"), class = "factor"), 
    Y = c(2.97, 5, 5.71, 3.85, 3.12, 12.5, 13.3, 13.2, 13.5, 
    13.1, 12.5, 13.3, 13.2, 13.5, 13.1, 12.5, 13.3, 13.2, 13.5
    ), x1 = c(16.4, 40.2, 40.7, 39.5, 38.9, 14, 15, 15, 15.2, 
    14.6, 14, 15, 15, 15.2, 14.6, 14, 15, 15, 15.2), x2 = c(9.23, 
    11.2, 12.5, 13, 13.2, 6.66, 6.74, 6.71, 6.86, 7.46, 6.66, 
    6.74, 6.71, 6.86, 7.46, 6.66, 6.74, 6.71, 6.86), x3 = c(2.22, 
    3.94, 4.66, 2.79, 1.98, 0.005, 0.513, 0.031, 0.111, 0.241, 
    0.005, 0.513, 0.031, 0.111, 0.241, 0.005, 0.513, 0.031, 0.111
    )), class = "data.frame", row.names = c(NA, -19L))

Answer 2

broom的方法：

library(modelr)
library(tidyverse)
library(broom)

nested <- df %>% 
  group_by(Company.Name, Quater) %>% 
  nest()

#specify regression
country_model <- function(df) {
  lm(Y ~ x1 + x2 + x3, data = df)
}

#unnest coefficients (only Intercept and one x1 here because most are NA)
nested %>% 
  mutate(model = map(data, country_model),
         tidy = map(model, broom::tidy)) %>% 
  unnest(tidy)

# A tibble: 19 x 9
# Groups:   Company.Name, Quater [18]
   Company.Name  Quater data             model  term        estimate std.error statistic p.value
   <chr>         <chr>  <list>           <list> <chr>          <dbl>     <dbl>     <dbl>   <dbl>
 1 A & M FOOD SE 1985Q1 <tibble [1 x 5]> <lm>   (Intercept) 2.97e+ 0       NaN       NaN     NaN
 2 A & M FOOD SE 1985Q2 <tibble [1 x 5]> <lm>   (Intercept) 5.00e+ 0       NaN       NaN     NaN
 3 A & M FOOD SE 1985Q3 <tibble [1 x 5]> <lm>   (Intercept) 5.71e+ 0       NaN       NaN     NaN
 4 A & M FOOD SE 1985Q4 <tibble [1 x 5]> <lm>   (Intercept) 3.85e+ 0       NaN       NaN     NaN
 5 A & M FOOD SE 1986Q1 <tibble [1 x 5]> <lm>   (Intercept) 3.12e+ 0       NaN       NaN     NaN
 6 A.A. IMPORTIN 1985Q4 <tibble [1 x 5]> <lm>   (Intercept) 1.25e+ 1       NaN       NaN     NaN
 7 A.A. IMPORTIN 1986Q1 <tibble [1 x 5]> <lm>   (Intercept) 1.33e+ 1       NaN       NaN     NaN
 8 A.A. IMPORTIN 1986Q2 <tibble [1 x 5]> <lm>   (Intercept) 1.32e+ 1       NaN       NaN     NaN
 9 A.A. IMPORTIN 1986Q3 <tibble [1 x 5]> <lm>   (Intercept) 1.35e+ 1       NaN       NaN     NaN
10 C.D. JUMPINGS 1986Q4 <tibble [1 x 5]> <lm>   (Intercept) 1.31e+ 1       NaN       NaN     NaN
11 C.D. JUMPINGS 1985Q4 <tibble [1 x 5]> <lm>   (Intercept) 1.25e+ 1       NaN       NaN     NaN
12 C.D. JUMPINGS 1986Q1 <tibble [1 x 5]> <lm>   (Intercept) 1.33e+ 1       NaN       NaN     NaN
13 C.D. JUMPINGS 1986Q2 <tibble [1 x 5]> <lm>   (Intercept) 1.32e+ 1       NaN       NaN     NaN
14 Kmart         1986Q3 <tibble [2 x 5]> <lm>   (Intercept) 1.35e+ 1       NaN       NaN     NaN
15 Kmart         1986Q3 <tibble [2 x 5]> <lm>   x1          3.81e-16       NaN       NaN     NaN
16 Kmart         1986Q4 <tibble [1 x 5]> <lm>   (Intercept) 1.31e+ 1       NaN       NaN     NaN
17 Kmart         1985Q4 <tibble [1 x 5]> <lm>   (Intercept) 1.25e+ 1       NaN       NaN     NaN
18 Kmart         1986Q1 <tibble [1 x 5]> <lm>   (Intercept) 1.33e+ 1       NaN       NaN     NaN
19 Kmart         1986Q2 <tibble [1 x 5]> <lm>   (Intercept) 1.32e+ 1       NaN       NaN     NaN

如何运行对每个季度和行业进行横截面回归？

How to run a cross-sectional regression for each quarter and industry?

regression

r

panel-data