在 x 和 y 变量列表上循环面板数据回归

Question

我正在使用面板数据并希望从变量列表中进行一系列双变量固定效应回归。

我的数据集的一小部分如下所示：

library(plm)
library(dplyr)

structure(list(v2x_libdem = c(0.876, 0.88, 0.763, 0.779), v2x_partipdem = c(0.679, 
0.68, 0.647, 0.652), v2x_frassoc_thick = c(0.937, 0.937, 0.9, 
0.899), country_name = c("Sweden", "Sweden", "Hungary", "Hungary"
), year = c(2000, 2008, 2000, 2008)), row.names = c(NA, -4L), class = c("tbl_df", 
"tbl", "data.frame"))

# A tibble: 4 × 5
  v2x_libdem v2x_partipdem v2x_frassoc_thick country_name  year
       <dbl>         <dbl>             <dbl> <chr>        <dbl>
1      0.876         0.679             0.937 Sweden        2000
2      0.88          0.68              0.937 Sweden        2008
3      0.763         0.647             0.9   Hungary       2000
4      0.779         0.652             0.899 Hungary       2008

我的变量列表如下所示：

vars <- c("v2x_libdem", "v2x_partipdem", "v2x_frassoc_thick")

这些变量应该同时是 x 和 y，因此作为 x 和 y 的变量的每个组合都应该在回归中。

我写了一个进行面板数据建模的函数：

paneldata_function <- function (y, x) {     #this function will do a PLM 
  plm(paste(y, "~", x), 
      data = Vdem_west,
      model = "within",
      index = c("year", "country_name"))   #for a given x & y variable
}

我需要用 x 和 y 的每个组合循环遍历这个函数。最好，我希望输出是四个值向量，我可以将其转换为数据集；一种用于系数，一种用于 x 变量，一种用于 y 变量，一种用于 p 值。

如果我只做一个模型，我可以轻松访问这些：

e <- paneldata_function("v2x_libdem", "v2x_partipdem")

p_value <- e[["fstatistic"]][["p.value"]]

x_var <- names(e[["model"]])[2]

y_var <- names(e[["model"]])[1]

如何编写循环才能将这些向量作为输出？

非常感谢任何帮助，我希望我的问题表述得尽可能清楚。

Answer 1

要遍历可能的字段，首先，您需要能够提供该信息的东西。无论您的数据集大小如何，此功能都可以使用，但您必须调整要保留的字段。（例如，这里你需要排除最后两个字段，所以我 select 1:3。有很多方法可以制作这个列表，但很多都是重复的（或相同的两个字段，但一次是 left-right，下一次是 right-left）。此方法使用包 RcppAlgos。2 表示您想要多少个变量。F 或 FALSE 仅呈现 return 个唯一对。

ndf = names(Vdem_west)   
cbo = RcppAlgos::comboGeneral(ndf[1:3], 2, F) # make a list of possible combinations
#      [,1]            [,2]               
# [1,] "v2x_libdem"    "v2x_partipdem"    
# [2,] "v2x_libdem"    "v2x_frassoc_thick"
# [3,] "v2x_partipdem" "v2x_frassoc_thick"

然后使用你现在的函数，我在这个 map 调用中调用它。 map 在 tidyverse.

# run all the models; store results in list of lists
res = map(1:nrow(cbo),
          ~paneldata_function(cbo[.x, 1], cbo[.x, 2]))

# name the models
names(res) <- paste0(cbo[, 1], "_", cbo[, 2])
names(res) # check it
# [1] "v2x_libdem_v2x_partipdem"        "v2x_libdem_v2x_frassoc_thick"   
# [3] "v2x_partipdem_v2x_frassoc_thick"

对于系数数据，我将输出写入数据框。你没有问型号名称，但我也添加了它。

map_dfr(1:length(res),
        .f = function(x){
          estP = summary(res[[x]])$coefficients[c(1, 4)]
          coef = dimnames(summary(res[[x]])$coefficients)[1]
          model = names(res)[x]
          c(Coefficient = coef, Est = round(estP[1], 4), 
            PV = round(estP[2], 4), Model = model)
        })
# # A tibble: 3 × 4
#   Coefficient         Est     PV Model                          
#   <chr>             <dbl>  <dbl> <chr>                          
# 1 v2x_partipdem     3.56  0.0067 v2x_libdem_v2x_partipdem       
# 2 v2x_frassoc_thick 2.85  0.0441 v2x_libdem_v2x_frassoc_thick   
# 3 v2x_frassoc_thick 0.799 0.0509 v2x_partipdem_v2x_frassoc_thick

我还整理了一个电话，以便您也可以查看模型性能。这可以与之前的调用相结合，但您没有要求这部分。（实际上我先做了这部分，然后意识到我没有回答你的问题。）

map_dfr(1:length(res),
        .f = function(x){
          R2 = summary(res[[x]])[["r.squared"]][["rsq"]] %>% round(4)
          Adj.R2 = summary(res[[x]])[["r.squared"]][["adjrsq"]] %>% round(4)
          PV = summary(res[[x]])[["fstatistic"]][["p.value"]] %>% round(4)
          Model = names(res)[x]
          c(R2 = R2, Adj.R2 = Adj.R2, PV = PV, Model = Model)
        })
# # A tibble: 3 × 4
#   R2     Adj.R2 PV.F   Model                          
#   <chr>  <chr>  <chr>  <chr>                          
# 1 0.9999 0.9997 0.0067 v2x_libdem_v2x_partipdem       
# 2 0.9952 0.9856 0.0441 v2x_libdem_v2x_frassoc_thick   
# 3 0.9936 0.9809 0.0509 v2x_partipdem_v2x_frassoc_thick

您可以使用相同的方法来测试您的模型并查看假设。

在 x 和 y 变量列表上循环面板数据回归

Loop paneldata regression over list of x and y variables

r

panel-data

dplyr