R plm lag - Stata 中 L1.x 的等价物是什么?

R plm lag - what is the equivalent to L1.x in Stata?

使用 R 中的 plm 包来拟合固定效应模型,向模型添加滞后变量的正确语法是什么?类似于 Stata 中的 'L1.variable' 命令。

这是我添加滞后变量的尝试(这是一个测试模型,可能没有意义):

library(foreign)
nlswork <- read.dta("http://www.stata-press.com/data/r11/nlswork.dta")
pnlswork <- plm.data(nlswork, c('idcode', 'year'))
ffe <- plm(ln_wage ~ ttl_exp+lag(wks_work,1)
           , model = 'within'
           , data = nlswork)
summary(ffe)

R输出:

Oneway (individual) effect Within Model

Call:
plm(formula = ln_wage ~ ttl_exp + lag(wks_work), data = nlswork, 
    model = "within")

Unbalanced Panel: n=3911, T=1-14, N=19619

Residuals :
    Min.  1st Qu.   Median  3rd Qu.     Max. 
-1.77000 -0.10100  0.00293  0.11000  2.90000 

Coefficients :
                Estimate Std. Error t-value  Pr(>|t|)    
ttl_exp       0.02341057 0.00073832 31.7078 < 2.2e-16 ***
lag(wks_work) 0.00081576 0.00010628  7.6755 1.744e-14 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    1296.9
Residual Sum of Squares: 1126.9
R-Squared:      0.13105
Adj. R-Squared: -0.085379
F-statistic: 1184.39 on 2 and 15706 DF, p-value: < 2.22e-16

但是,与 Stata 生成的结果相比,我得到了不同的结果。

在我的实际模型中,我想检测一个内生变量及其滞后值。

谢谢!

作为参考,这里是 Stata 代码:

webuse nlswork.dta
xtset idcode year
xtreg ln_wage ttl_exp L1.wks_work, fe

Stata 输出:

Fixed-effects (within) regression               Number of obs     =     10,680
Group variable: idcode                          Number of groups  =      3,671

R-sq:                                           Obs per group:
     within  = 0.1492                                         min =          1
     between = 0.2063                                         avg =        2.9
     overall = 0.1483                                         max =          8

                                                F(2,7007)         =     614.60
corr(u_i, Xb)  = 0.1329                         Prob > F          =     0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     ttl_exp |   .0192578   .0012233    15.74   0.000     .0168597    .0216558
             |
    wks_work |
         L1. |   .0015891   .0001957     8.12   0.000     .0012054    .0019728
             |
       _cons |   1.502879   .0075431   199.24   0.000     1.488092    1.517666
-------------+----------------------------------------------------------------
     sigma_u |  .40678942
     sigma_e |  .28124886
         rho |  .67658275   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(3670, 7007) = 4.71                  Prob > F = 0.0000

lag() 就像在 plm 中一样,在不“查看”时间变量的情况下逐行滞后观察,即它移动变量(每个个体)。如果时间维度中存在间隙,您可能需要考虑时间变量的值。有(截至目前)未导出的函数 plm:::lagt.pseries,它考虑了时间变量,因此可以像您预期的那样处理数据中的间隙。

Edit:自 plm 1.7-0 版以来,plm 中 lag 的默认行为是按时间移动,但可以通过参数 [=18] 控制行为=](shift = c("time", "row")) 按时间或按行移动(旧行为)。

使用如下:

library(plm)
library(foreign)
nlswork <- read.dta("http://www.stata-press.com/data/r11/nlswork.dta")
pnlswork <- pdata.frame(nlswork, c('idcode', 'year'))
ffe <- plm(ln_wage ~ ttl_exp + plm:::lagt.pseries(wks_work,1)
           , model = 'within'
           , data = pnlswork)
summary(ffe)

Oneway (individual) effect Within Model

Call:
plm(formula = ln_wage ~ ttl_exp + plm:::lagt.pseries(wks_work, 
    1), data = nlswork, model = "within")

Unbalanced Panel: n=3671, T=1-8, N=10680

Residuals :
   Min. 1st Qu.  Median 3rd Qu.    Max. 
-1.5900 -0.0859  0.0000  0.0957  2.5600 

Coefficients :
                                  Estimate Std. Error t-value  Pr(>|t|)    
ttl_exp                         0.01925775 0.00122330 15.7425 < 2.2e-16 ***
plm:::lagt.pseries(wks_work, 1) 0.00158907 0.00019573  8.1186 5.525e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    651.49
Residual Sum of Squares: 554.26
R-Squared:      0.14924
Adj. R-Squared: -0.29659
F-statistic: 614.604 on 2 and 7007 DF, p-value: < 2.22e-16

顺便说一句:最好使用 pdata.frame() 而不是 plm.data()。 顺便说一句:您可以使用 plm 的 is.pconsecutive():

检查数据中的差距
is.pconsecutive(pnlswork)
all(is.pconsecutive(pnlswork))

也可以先让数据连续,再用lag(),像这样:

pnlswork2 <- make.pconsecutive(pnlswork)
pnlswork2$wks_work_lag <- lag(pnlswork2$wks_work)
ffe2 <- plm(ln_wage ~ ttl_exp + wks_work_lag
           , model = 'within'
           , data = pnlswork2)
summary(ffe2)

Oneway (individual) effect Within Model

Call:
plm(formula = ln_wage ~ ttl_exp + wks_work_lag, data = pnlswork2, 
    model = "within")

Unbalanced Panel: n=3671, T=1-8, N=10680

Residuals :
   Min. 1st Qu.  Median 3rd Qu.    Max. 
-1.5900 -0.0859  0.0000  0.0957  2.5600 

Coefficients :
               Estimate Std. Error t-value  Pr(>|t|)    
ttl_exp      0.01925775 0.00122330 15.7425 < 2.2e-16 ***
wks_work_lag 0.00158907 0.00019573  8.1186 5.525e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    651.49
Residual Sum of Squares: 554.26
R-Squared:      0.14924
Adj. R-Squared: -0.29659
F-statistic: 614.604 on 2 and 7007 DF, p-value: < 2.22e-16

或者简单地说:

ffe3 <- plm(ln_wage ~ ttl_exp + lag(wks_work)
            , model = 'within'
            , data = pnlswork2) # note: it is the consecutive panel data set here
summary(ffe3)

Oneway (individual) effect Within Model

Call:
plm(formula = ln_wage ~ ttl_exp + lag(wks_work), data = pnlswork2, 
    model = "within")

Unbalanced Panel: n=3671, T=1-8, N=10680

Residuals :
   Min. 1st Qu.  Median 3rd Qu.    Max. 
-1.5900 -0.0859  0.0000  0.0957  2.5600 

Coefficients :
                Estimate Std. Error t-value  Pr(>|t|)    
ttl_exp       0.01925775 0.00122330 15.7425 < 2.2e-16 ***
lag(wks_work) 0.00158907 0.00019573  8.1186 5.525e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    651.49
Residual Sum of Squares: 554.26
R-Squared:      0.14924
Adj. R-Squared: -0.29659
F-statistic: 614.604 on 2 and 7007 DF, p-value: < 2.22e-16