R plm lag - Stata 中 L1.x 的等价物是什么?
R plm lag - what is the equivalent to L1.x in Stata?
使用 R 中的 plm
包来拟合固定效应模型,向模型添加滞后变量的正确语法是什么?类似于 Stata 中的 'L1.variable' 命令。
这是我添加滞后变量的尝试(这是一个测试模型,可能没有意义):
library(foreign)
nlswork <- read.dta("http://www.stata-press.com/data/r11/nlswork.dta")
pnlswork <- plm.data(nlswork, c('idcode', 'year'))
ffe <- plm(ln_wage ~ ttl_exp+lag(wks_work,1)
, model = 'within'
, data = nlswork)
summary(ffe)
R输出:
Oneway (individual) effect Within Model
Call:
plm(formula = ln_wage ~ ttl_exp + lag(wks_work), data = nlswork,
model = "within")
Unbalanced Panel: n=3911, T=1-14, N=19619
Residuals :
Min. 1st Qu. Median 3rd Qu. Max.
-1.77000 -0.10100 0.00293 0.11000 2.90000
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
ttl_exp 0.02341057 0.00073832 31.7078 < 2.2e-16 ***
lag(wks_work) 0.00081576 0.00010628 7.6755 1.744e-14 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Total Sum of Squares: 1296.9
Residual Sum of Squares: 1126.9
R-Squared: 0.13105
Adj. R-Squared: -0.085379
F-statistic: 1184.39 on 2 and 15706 DF, p-value: < 2.22e-16
但是,与 Stata 生成的结果相比,我得到了不同的结果。
在我的实际模型中,我想检测一个内生变量及其滞后值。
谢谢!
作为参考,这里是 Stata 代码:
webuse nlswork.dta
xtset idcode year
xtreg ln_wage ttl_exp L1.wks_work, fe
Stata 输出:
Fixed-effects (within) regression Number of obs = 10,680
Group variable: idcode Number of groups = 3,671
R-sq: Obs per group:
within = 0.1492 min = 1
between = 0.2063 avg = 2.9
overall = 0.1483 max = 8
F(2,7007) = 614.60
corr(u_i, Xb) = 0.1329 Prob > F = 0.0000
------------------------------------------------------------------------------
ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ttl_exp | .0192578 .0012233 15.74 0.000 .0168597 .0216558
|
wks_work |
L1. | .0015891 .0001957 8.12 0.000 .0012054 .0019728
|
_cons | 1.502879 .0075431 199.24 0.000 1.488092 1.517666
-------------+----------------------------------------------------------------
sigma_u | .40678942
sigma_e | .28124886
rho | .67658275 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(3670, 7007) = 4.71 Prob > F = 0.0000
lag()
就像在 plm
中一样,在不“查看”时间变量的情况下逐行滞后观察,即它移动变量(每个个体)。如果时间维度中存在间隙,您可能需要考虑时间变量的值。有(截至目前)未导出的函数 plm:::lagt.pseries
,它考虑了时间变量,因此可以像您预期的那样处理数据中的间隙。
Edit:自 plm 1.7-0 版以来,plm 中 lag
的默认行为是按时间移动,但可以通过参数 [=18] 控制行为=](shift = c("time", "row")
) 按时间或按行移动(旧行为)。
使用如下:
library(plm)
library(foreign)
nlswork <- read.dta("http://www.stata-press.com/data/r11/nlswork.dta")
pnlswork <- pdata.frame(nlswork, c('idcode', 'year'))
ffe <- plm(ln_wage ~ ttl_exp + plm:::lagt.pseries(wks_work,1)
, model = 'within'
, data = pnlswork)
summary(ffe)
Oneway (individual) effect Within Model
Call:
plm(formula = ln_wage ~ ttl_exp + plm:::lagt.pseries(wks_work,
1), data = nlswork, model = "within")
Unbalanced Panel: n=3671, T=1-8, N=10680
Residuals :
Min. 1st Qu. Median 3rd Qu. Max.
-1.5900 -0.0859 0.0000 0.0957 2.5600
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
ttl_exp 0.01925775 0.00122330 15.7425 < 2.2e-16 ***
plm:::lagt.pseries(wks_work, 1) 0.00158907 0.00019573 8.1186 5.525e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Total Sum of Squares: 651.49
Residual Sum of Squares: 554.26
R-Squared: 0.14924
Adj. R-Squared: -0.29659
F-statistic: 614.604 on 2 and 7007 DF, p-value: < 2.22e-16
顺便说一句:最好使用 pdata.frame()
而不是 plm.data()
。
顺便说一句:您可以使用 plm 的 is.pconsecutive()
:
检查数据中的差距
is.pconsecutive(pnlswork)
all(is.pconsecutive(pnlswork))
也可以先让数据连续,再用lag()
,像这样:
pnlswork2 <- make.pconsecutive(pnlswork)
pnlswork2$wks_work_lag <- lag(pnlswork2$wks_work)
ffe2 <- plm(ln_wage ~ ttl_exp + wks_work_lag
, model = 'within'
, data = pnlswork2)
summary(ffe2)
Oneway (individual) effect Within Model
Call:
plm(formula = ln_wage ~ ttl_exp + wks_work_lag, data = pnlswork2,
model = "within")
Unbalanced Panel: n=3671, T=1-8, N=10680
Residuals :
Min. 1st Qu. Median 3rd Qu. Max.
-1.5900 -0.0859 0.0000 0.0957 2.5600
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
ttl_exp 0.01925775 0.00122330 15.7425 < 2.2e-16 ***
wks_work_lag 0.00158907 0.00019573 8.1186 5.525e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Total Sum of Squares: 651.49
Residual Sum of Squares: 554.26
R-Squared: 0.14924
Adj. R-Squared: -0.29659
F-statistic: 614.604 on 2 and 7007 DF, p-value: < 2.22e-16
或者简单地说:
ffe3 <- plm(ln_wage ~ ttl_exp + lag(wks_work)
, model = 'within'
, data = pnlswork2) # note: it is the consecutive panel data set here
summary(ffe3)
Oneway (individual) effect Within Model
Call:
plm(formula = ln_wage ~ ttl_exp + lag(wks_work), data = pnlswork2,
model = "within")
Unbalanced Panel: n=3671, T=1-8, N=10680
Residuals :
Min. 1st Qu. Median 3rd Qu. Max.
-1.5900 -0.0859 0.0000 0.0957 2.5600
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
ttl_exp 0.01925775 0.00122330 15.7425 < 2.2e-16 ***
lag(wks_work) 0.00158907 0.00019573 8.1186 5.525e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Total Sum of Squares: 651.49
Residual Sum of Squares: 554.26
R-Squared: 0.14924
Adj. R-Squared: -0.29659
F-statistic: 614.604 on 2 and 7007 DF, p-value: < 2.22e-16
使用 R 中的 plm
包来拟合固定效应模型,向模型添加滞后变量的正确语法是什么?类似于 Stata 中的 'L1.variable' 命令。
这是我添加滞后变量的尝试(这是一个测试模型,可能没有意义):
library(foreign)
nlswork <- read.dta("http://www.stata-press.com/data/r11/nlswork.dta")
pnlswork <- plm.data(nlswork, c('idcode', 'year'))
ffe <- plm(ln_wage ~ ttl_exp+lag(wks_work,1)
, model = 'within'
, data = nlswork)
summary(ffe)
R输出:
Oneway (individual) effect Within Model
Call:
plm(formula = ln_wage ~ ttl_exp + lag(wks_work), data = nlswork,
model = "within")
Unbalanced Panel: n=3911, T=1-14, N=19619
Residuals :
Min. 1st Qu. Median 3rd Qu. Max.
-1.77000 -0.10100 0.00293 0.11000 2.90000
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
ttl_exp 0.02341057 0.00073832 31.7078 < 2.2e-16 ***
lag(wks_work) 0.00081576 0.00010628 7.6755 1.744e-14 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Total Sum of Squares: 1296.9
Residual Sum of Squares: 1126.9
R-Squared: 0.13105
Adj. R-Squared: -0.085379
F-statistic: 1184.39 on 2 and 15706 DF, p-value: < 2.22e-16
但是,与 Stata 生成的结果相比,我得到了不同的结果。
在我的实际模型中,我想检测一个内生变量及其滞后值。
谢谢!
作为参考,这里是 Stata 代码:
webuse nlswork.dta
xtset idcode year
xtreg ln_wage ttl_exp L1.wks_work, fe
Stata 输出:
Fixed-effects (within) regression Number of obs = 10,680
Group variable: idcode Number of groups = 3,671
R-sq: Obs per group:
within = 0.1492 min = 1
between = 0.2063 avg = 2.9
overall = 0.1483 max = 8
F(2,7007) = 614.60
corr(u_i, Xb) = 0.1329 Prob > F = 0.0000
------------------------------------------------------------------------------
ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ttl_exp | .0192578 .0012233 15.74 0.000 .0168597 .0216558
|
wks_work |
L1. | .0015891 .0001957 8.12 0.000 .0012054 .0019728
|
_cons | 1.502879 .0075431 199.24 0.000 1.488092 1.517666
-------------+----------------------------------------------------------------
sigma_u | .40678942
sigma_e | .28124886
rho | .67658275 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(3670, 7007) = 4.71 Prob > F = 0.0000
lag()
就像在 plm
中一样,在不“查看”时间变量的情况下逐行滞后观察,即它移动变量(每个个体)。如果时间维度中存在间隙,您可能需要考虑时间变量的值。有(截至目前)未导出的函数 plm:::lagt.pseries
,它考虑了时间变量,因此可以像您预期的那样处理数据中的间隙。
Edit:自 plm 1.7-0 版以来,plm 中 lag
的默认行为是按时间移动,但可以通过参数 [=18] 控制行为=](shift = c("time", "row")
) 按时间或按行移动(旧行为)。
使用如下:
library(plm)
library(foreign)
nlswork <- read.dta("http://www.stata-press.com/data/r11/nlswork.dta")
pnlswork <- pdata.frame(nlswork, c('idcode', 'year'))
ffe <- plm(ln_wage ~ ttl_exp + plm:::lagt.pseries(wks_work,1)
, model = 'within'
, data = pnlswork)
summary(ffe)
Oneway (individual) effect Within Model
Call:
plm(formula = ln_wage ~ ttl_exp + plm:::lagt.pseries(wks_work,
1), data = nlswork, model = "within")
Unbalanced Panel: n=3671, T=1-8, N=10680
Residuals :
Min. 1st Qu. Median 3rd Qu. Max.
-1.5900 -0.0859 0.0000 0.0957 2.5600
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
ttl_exp 0.01925775 0.00122330 15.7425 < 2.2e-16 ***
plm:::lagt.pseries(wks_work, 1) 0.00158907 0.00019573 8.1186 5.525e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Total Sum of Squares: 651.49
Residual Sum of Squares: 554.26
R-Squared: 0.14924
Adj. R-Squared: -0.29659
F-statistic: 614.604 on 2 and 7007 DF, p-value: < 2.22e-16
顺便说一句:最好使用 pdata.frame()
而不是 plm.data()
。
顺便说一句:您可以使用 plm 的 is.pconsecutive()
:
is.pconsecutive(pnlswork)
all(is.pconsecutive(pnlswork))
也可以先让数据连续,再用lag()
,像这样:
pnlswork2 <- make.pconsecutive(pnlswork)
pnlswork2$wks_work_lag <- lag(pnlswork2$wks_work)
ffe2 <- plm(ln_wage ~ ttl_exp + wks_work_lag
, model = 'within'
, data = pnlswork2)
summary(ffe2)
Oneway (individual) effect Within Model
Call:
plm(formula = ln_wage ~ ttl_exp + wks_work_lag, data = pnlswork2,
model = "within")
Unbalanced Panel: n=3671, T=1-8, N=10680
Residuals :
Min. 1st Qu. Median 3rd Qu. Max.
-1.5900 -0.0859 0.0000 0.0957 2.5600
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
ttl_exp 0.01925775 0.00122330 15.7425 < 2.2e-16 ***
wks_work_lag 0.00158907 0.00019573 8.1186 5.525e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Total Sum of Squares: 651.49
Residual Sum of Squares: 554.26
R-Squared: 0.14924
Adj. R-Squared: -0.29659
F-statistic: 614.604 on 2 and 7007 DF, p-value: < 2.22e-16
或者简单地说:
ffe3 <- plm(ln_wage ~ ttl_exp + lag(wks_work)
, model = 'within'
, data = pnlswork2) # note: it is the consecutive panel data set here
summary(ffe3)
Oneway (individual) effect Within Model
Call:
plm(formula = ln_wage ~ ttl_exp + lag(wks_work), data = pnlswork2,
model = "within")
Unbalanced Panel: n=3671, T=1-8, N=10680
Residuals :
Min. 1st Qu. Median 3rd Qu. Max.
-1.5900 -0.0859 0.0000 0.0957 2.5600
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
ttl_exp 0.01925775 0.00122330 15.7425 < 2.2e-16 ***
lag(wks_work) 0.00158907 0.00019573 8.1186 5.525e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Total Sum of Squares: 651.49
Residual Sum of Squares: 554.26
R-Squared: 0.14924
Adj. R-Squared: -0.29659
F-statistic: 614.604 on 2 and 7007 DF, p-value: < 2.22e-16