使用 bife 包时删除线性因变量

Question

某些预编程模型在 R 中自动删除其回归输出中的线性因变量（例如 lm()）。使用 bife 包，这似乎是不可能的。如第 5 页 CRAN 中的包装说明所述：

If bife does not converge this is usually a sign of linear dependence between one or more regressors and the fixed effects. In this case, you should carefully inspect your model specification.

现在，假设手头的问题涉及进行许多回归，并且不能充分检查每个回归输出——必须假设某种关于回归变量的经验法则。有哪些替代方案可以或多或少地自动移除线性相关回归变量并获得足够的模型规范？

下面我设置一个代码作为例子：

#sample coding

x=10*rnorm(40)
z=100*rnorm(40)

df1=data.frame(a=rep(c(0,1),times=20), x=x, y=x, z=z, ID=c(1:40), date=1, Region=rep(c(1,2, 3, 4),10))
df2=data.frame(a=c(rep(c(1,0),times=15),rep(c(0,1),times=5)), x=1.4*x+4, y=1.4*x+4, z=1.2*z+5, ID=c(1:40), date=2, Region=rep(c(1,2,3,4),10))
df3=rbind(df1,df2)

df3=rbind(df1,df2)

for(i in 1:4) {
  
  x=df3[df3$Region==i,]
  
  model =  bife::bife(a ~ x + y + z | ID, data = x)
  
  results=data.frame(Region=unique(df3$Region))
  
  results$Model = results

  if (i==1){
      df4=df
      next
  }

df4=rbind(df4,df)

  
} 

Error: Linear dependent terms detected!

Answer 1

由于您只查看线性相关性，因此您可以简单地利用检测它们的方法，例如 lm.

下面是包 fixest:

的解决方案示例

library(bife)
library(fixest)

x = 10*rnorm(40)
z = 100*rnorm(40)

df1 = data.frame(a=rep(c(0,1),times=20), x=x, y=x, z=z, ID=c(1:40), date=1, Region=rep(c(1,2, 3, 4),10))

df2 = data.frame(a=c(rep(c(1,0),times=15),rep(c(0,1),times=5)), x=1.4*x+4, y=1.4*x+4, z=1.2*z+5, ID=c(1:40), date=2, Region=rep(c(1,2,3,4),10))

df3 = rbind(df1, df2)

vars = c("x", "y", "z")

res_all = list()
for(i in 1:4) {
    x = df3[df3$Region == i, ]

    coll_vars = feols(a ~ x + y + z | ID, x, notes = FALSE)$collin.var
    new_fml = xpd(a ~ ..vars | ID, ..vars = setdiff(vars, coll_vars))
    res_all[[i]] = bife::bife(new_fml, data = x)
}

# Display all results
for(i in 1:4) {
    cat("\n#\n# Region: ", i, "\n#\n\n")
    print(summary(res_all[[i]]))
}

这里需要的函数是feols和xpd，这两个来自fixest。一些解释：

feols 与 lm 一样，在发现共线时即时删除变量。它将共线变量的名称存储在插槽 $collin.var 中（如果找到 none，则为 NULL）。
与lm相反，feols也允许固定效应，所以你可以在寻找线性依赖时添加它：这样你就可以发现复杂的线性依赖也会涉及固定效应。
我已经设置了 notes = FALSE 否则 feols 会提示一条关于共线性的注释。
feols 很快（对于大型数据集，实际上比 lm 快）因此不会对您的分析造成压力。
函数 xpd 扩展公式并用用户提供的相关参数替换任何以两个点开头的变量名。
- 当xpd的参数是向量时，行为是用加号强制转换，所以如果提供..vars = c("x", "y")，公式a ~ ..vars | ID将变为a ~ x + y | ID.
- 这里将公式中的..vars替换为setdiff(vars, coll_vars))，即未发现共线的变量向量

所以你得到了一个在执行 bife 估计之前自动删除变量的算法。

最后，补充一点：通常最好将结果存储在列表中，因为这样可以避免复制。

更新

我忘记了，如果你不需要偏差校正（bife::bias_corr），那么你可以直接使用fixest::feglm自动去除共线变量：

res_bife = bife::bife(a ~ x + z | ID, data = df3)
res_feglm = fixest::feglm(a ~ x + y + z | ID, df3, family = binomial)

rbind(coef(res_bife), coef(res_feglm))
#>                x          z
#> [1,] -0.02221848 0.03045968
#> [2,] -0.02221871 0.03045990

使用 bife 包时删除线性因变量

Remove linear dependent variables while using the bife package

r

logistic-regression

mlogit

multicollinearity

更新