如何计算 Longitudinal/Panel 数据的非线性（二元）固定效应 Logit？

Question

我正在尝试根据儿童上学愿望的滞后变量来估算儿童工作。
我正在决定是否应该使用 glm 或 clogit 来运行我的模型（需要固定效果 logits）。当我运行我的 glm 时，我的系数与我的 clogit 非常不同。

model1 <- glm(chldwork~lag_aspgrade_binned+age+as.factor(childid), data=finaletdtlag, family='binomial')

  GLM Output:
Call:
glm(formula = chldwork ~ lag_aspgrade_binned + age + as.factor(childid), 
    family = "binomial", data = finaletdtlag)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-3.02350   0.00001   0.00002   0.17344   2.13769  

Coefficients:
                                                 Estimate Std. Error z value Pr(>|z|)    
(Intercept)                                     3.037e+01  1.933e+04   0.002   0.9987    
lag_aspgrade_binneddid not complete elementary  2.339e+00  1.083e+00   2.161   0.0307 *  
lag_aspgrade_binnedhs                           1.252e+00  6.082e-01   2.059   0.0395 *  
lag_aspgrade_binnedprimary some hs              1.206e+00  6.739e-01   1.789   0.0735 .  
lag_aspgrade_binnedsome college                 2.081e+00  4.800e-01   4.335 1.46e-05 ***
age                                            -6.123e-01  3.995e-02 -15.326  < 2e-16 ***

此外，当我运行我的 clogit 时，我没有在我的输出中截取（如这个例子所示：https://data.princeton.edu/wws509/r/fixedRandom3）。

我的 clogit 输出：

 > modela <- clogit(chldwork~lag_aspgrade_binned+age+strata(childid), data=finaletdtlag, method = 'exact')
> summary(modela)
Call:
coxph(formula = Surv(rep(1, 2770L), chldwork) ~ lag_aspgrade_binned + 
    age + strata(childid), data = finaletdtlag, method = "exact")

  n= 2770, number of events= 2358 

                                                   coef exp(coef) se(coef)       z Pr(>|z|)    
lag_aspgrade_binneddid not complete elementary  1.09351   2.98473  0.83332   1.312  0.18944    
lag_aspgrade_binnedhs                           0.53032   1.69948  0.45095   1.176  0.23959    
lag_aspgrade_binnedprimary some hs              0.49815   1.64567  0.50075   0.995  0.31983    
lag_aspgrade_binnedsome college                 1.00269   2.72560  0.34619   2.896  0.00377 ** 
age                                            -0.36846   0.69180  0.02905 -12.684  < 2e-16 ***

我的代码有错误吗？你们都喜欢其中一个吗？

Answer 1

使用最小二乘虚拟变量 (LSDV) 方法计算非线性（二元）固定效应 (FE) 模型，就像您使用 glm 一样，尚未考虑附带参数问题 (IPP) ，因此估计量是有偏的。基本上，IPP 发生在观察的数量，因此个体假人的数量，相对于观察到的时间段很大。您可以在 this answer on Cross Validated.

中找到 IPP 的统计解释

survival::clogit 以及 bife::bife 结合 bife::bias_corr 将 IPP 考虑在内。

让我们尝试使用 glm 计算二元有限元模型（参见答案底部的“数据”）。

s <- summary(g.fit <- glm(y ~ 0 + x1 + x2 + factor(id), data=idc, family='binomial'))$coe
s[-grep("factor", rownames(s)), ]
#      Estimate Std. Error  z value   Pr(>|z|)
# x1 0.80750181 0.32069729 2.517956 0.01180379
# x2 0.08353417 0.05040558 1.657240 0.09747086

用 bife::bife 计算模型最初得到相同的结果。

summary(b.fit <- bife::bife(y ~ x1 + x2 | id, data=idc))$cm
#      Estimate Std. error  z value  Pr(> |z|)
# x1 0.80750173 0.32070309 2.517911 0.01180533
# x2 0.08353417 0.05040645 1.657212 0.09747662

但是，结果有偏差，当我们校正 IPP 时，我们得到更小的估计值：

summary(b.fit.corr <- bias_corr(b.fit))$cm
#      Estimate Std. error  z value  Pr(> |z|)
# x1 0.65672233  0.3192673 2.056967 0.03968939
# x2 0.06672389  0.0498996 1.337163 0.18116946

clogit 现在已经考虑了 IPP。

summary(survival::clogit(y ~ x1 + x2 + strata(id), data=idc))$coe
#         coef exp(coef)  se(coef)        z   Pr(>|z|)
# x1 0.6533630  1.921994 0.2875215 2.272397 0.02306255
# x2 0.0659169  1.068138 0.0449555 1.466270 0.14257474

请注意，bife::bife 计算标准误差更为保守。但是，使用 Stata 我们得到与 survival::clogit:

相同的结果

. use http://www.stata-press.com/data/r13/clogitid, clear

. clogit y x1 x2, group(id) noheader
note: multiple positive outcomes within groups encountered.

Iteration 0:   log likelihood = -123.42828  
Iteration 1:   log likelihood = -123.41386  
Iteration 2:   log likelihood = -123.41386  
------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |    .653363   .2875215     2.27   0.023     .0898312    1.216895
          x2 |   .0659169   .0449555     1.47   0.143    -.0221943    .1540281
------------------------------------------------------------------------------

因此，要回答您的问题，您的 clogit 方法的结果应该是首选。

您的“丢失”拦截问题解释如下。在线性回归模型中，我们有 one 总截距 a,

而在固定效应中我们有许多个人截距 a_i.

bife::bife 将固定效果存储在名为 alpha:

的对象中

b.fit.corr$alpha
#         1014         1017         1019         1023         1025         1027 
# -1.192037100 -0.740369821 -0.093573868 -0.740850707  0.374930145 -1.179830711 
#         1029         1030         1031         1032         1039         1043 
# -0.702326346 -0.460228413 -1.441325728 -1.286722837 -0.117864675 -2.140867994 
#         1044         1045         1047         1050         1055         1060 
# -2.114299987  0.645971508 -0.436457378 -1.165316816 -1.657762052 -0.720114822 
#         1069         1073         1074         1075         1076         1077 
# -1.637936117 -0.782373571 -1.395162657 -1.395167427 -1.637684316 -1.696587849 
#         1078         1092         1094         1095         1097         1098 
# -1.106264431 -0.127039684 -1.439563580 -1.310283872 -0.778302356 -1.982045758 
#         1099         1104         1105         1108         1110         1113 
# -0.005352088 -0.006881257 -1.802152970 -1.165296728 -0.314567361 -1.817725352 
#         1118         1120         1121         1122         1125         1126 
# -1.110847553 -2.128173826 -1.803025930 -1.164773956 -1.107030040 -2.251146286 
#         1128         1133         1137         1141         1142         1144 
# -0.940981858 -1.416409893 -1.441811848 -0.330724832 -1.657610560 -1.136508411 
#         1146         1147         1148         1149         1150         1154 
# -1.286055926  0.196135238 -1.107793309 -1.637240256 -2.226747493 -0.701304388 
#         1155         1156         1157         1163         1169         1172 
# -1.658472805 -1.654763658 -1.134978654 -2.024766764 -1.440093115 -0.940165139 
#         1176         1181         1186         1187         1191         1195 
# -0.481589127 -2.114877897 -1.137394808 -0.006881257 -1.636654053 -0.027152409

它们对应于glm模型中的个体假人，

g.fit$coefficients[grep("factor", names(g.fit$coe))]

然而——如上所述——由于 IPP，这些都是有偏见的。

注意：我找不到用survival::clogit提取FE的方法，如果有人知道怎么做，请在评论中告诉我!

数据：

idc <- readstata13::read.dta13("http://www.stata-press.com/data/r11/clogitid.dta")

如何计算 Longitudinal/Panel 数据的非线性（二元）固定效应 Logit？

How to calculate nonlinear (binary) Fixed-Effects Logit for Longitudinal/Panel Data?

r

glm

logistic-regression

longitudinal

数据：