plm() 函数中选择的未定义列
undefined columns selected in plm() function
我在 plm() 函数中遇到了一个奇怪的问题。下面是代码:
#Data Generation
n <- 500
z <- rnorm(n)
w <- rnorm(n)
x <- 5*z + 50
y <- -100*z+ 1100 + 50*w
y <- 10*round(y/10)
y <- ifelse(y<200,200,y)
y <- ifelse(y>1600,1600,y)
dt1 <- data.table('id'=1:500,'sat'=y,'income'=x,'group'=rep(1,n))
z <- rnorm(n)
w <- rnorm(n)
x <- 5*z + 80
y <- -80*z+ 1200 + 50*w
y <- 10*round(y/10)
y <- ifelse(y<200,200,y)
y <- ifelse(y>1600,1600,y)
dt2 <- data.table('id'=501:1000,'sat'=y,'income'=x,'group'=rep(2,n))
z <- rnorm(n)
w <- rnorm(n)
x <- 5*z + 30
y <- -120*z+ 1000 + 50*w
y <- 10*round(y/10)
y <- ifelse(y<200,200,y)
y <- ifelse(y>1600,1600,y)
dt3 <- data.table('id'=1001:1500,'sat'=y,'income'=x,'group'=rep(3,n))
dtable <- merge(dt1 ,dt2, all=TRUE)
dtable <- merge(dtable ,dt3, all=TRUE)
# Model
dtable_p <- pdata.frame(dtable, index = "group")
mod_1 <- plm(sat ~ income, data = dtable_p,model = "pooling")
Error in [.data.frame
(x, , which) : undefined columns selected
我检查了所有的可能性,但我不明白为什么它会给我一个错误。列的名称是正确的,为什么 R 说未定义的列???谢谢!
跟进:我添加另一个数据集测试作为@StupidWolf 用来证明
data("Produc", package = "plm")
form <- log(gsp) ~ log(pc)
Produc$group <- Produc$region
pProduc <- pdata.frame(Produc, index = "group")
Produc$group <- rep(1:48, each = 17)
summary(plm(form, data = pProduc, model = "pooling"))
>Error in `[.data.frame`(x, , which) : undefined columns selected
我怀疑在 plm 函数的某个地方,它一定是在你的 data.frame 中添加了一个 "group"。
data("Produc", package = "plm")
form <- log(gsp) ~ log(pc)
Produc$group = Produc$region
pProduc <- pdata.frame(Produc, index = c("group"))
summary(plm(form, data = pProduc, model = "random"))
Error in `[.data.frame`(x, , which) : undefined columns selected
使用我从中复制的 "region" 列,它有效:
pProduc <- pdata.frame(Produc, index = c("region"))
summary(plm(form, data = pProduc, model = "random"))
Oneway (individual) effect Random Effect Model
(Swamy-Arora's transformation)
plm(formula = form, data = pProduc, model = "random")
Unbalanced Panel: n = 9, T = 51-136, N = 816
var share
idiosyncratic 0.03691 0.19213 0.402
individual 0.05502 0.23457 0.598
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.8861 0.9012 0.9192 0.9157 0.9299 0.9299
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.68180 -0.11014 0.00977 -0.00039 0.13815 0.45491
Estimate Std. Error z-value Pr(>|z|)
(Intercept) -1.099088 0.138395 -7.9417 1.994e-15 ***
log(pc) 1.100102 0.010623 103.5627 < 2.2e-16 ***
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Total Sum of Squares: 459.71
Residual Sum of Squares: 30.029
R-Squared: 0.93468
Adj. R-Squared: 0.9346
Chisq: 11647.6 on 1 DF, p-value: < 2.22e-16
对于您的示例,只需重命名列 "group" 并将其设置为避免其他错误的因素。 (对于 "pooling",应将其视为分类而非数字)。
dtable <- merge(dt1 ,dt2, all=TRUE)
dtable <- merge(dtable ,dt3, all=TRUE)
dtable$group = factor(dtable$group)
colnames(dtable)[4] = "GROUP"
dtable_p <- pdata.frame(dtable, index = "GROUP")
summary(plm(sat ~ income, data = dtable_p,method="pooling"))
我在 plm() 函数中遇到了一个奇怪的问题。下面是代码:
#Data Generation
n <- 500
z <- rnorm(n)
w <- rnorm(n)
x <- 5*z + 50
y <- -100*z+ 1100 + 50*w
y <- 10*round(y/10)
y <- ifelse(y<200,200,y)
y <- ifelse(y>1600,1600,y)
dt1 <- data.table('id'=1:500,'sat'=y,'income'=x,'group'=rep(1,n))
z <- rnorm(n)
w <- rnorm(n)
x <- 5*z + 80
y <- -80*z+ 1200 + 50*w
y <- 10*round(y/10)
y <- ifelse(y<200,200,y)
y <- ifelse(y>1600,1600,y)
dt2 <- data.table('id'=501:1000,'sat'=y,'income'=x,'group'=rep(2,n))
z <- rnorm(n)
w <- rnorm(n)
x <- 5*z + 30
y <- -120*z+ 1000 + 50*w
y <- 10*round(y/10)
y <- ifelse(y<200,200,y)
y <- ifelse(y>1600,1600,y)
dt3 <- data.table('id'=1001:1500,'sat'=y,'income'=x,'group'=rep(3,n))
dtable <- merge(dt1 ,dt2, all=TRUE)
dtable <- merge(dtable ,dt3, all=TRUE)
# Model
dtable_p <- pdata.frame(dtable, index = "group")
mod_1 <- plm(sat ~ income, data = dtable_p,model = "pooling")
Error in
(x, , which) : undefined columns selected
我检查了所有的可能性,但我不明白为什么它会给我一个错误。列的名称是正确的,为什么 R 说未定义的列???谢谢!
跟进:我添加另一个数据集测试作为@StupidWolf 用来证明
data("Produc", package = "plm")
form <- log(gsp) ~ log(pc)
Produc$group <- Produc$region
pProduc <- pdata.frame(Produc, index = "group")
Produc$group <- rep(1:48, each = 17)
summary(plm(form, data = pProduc, model = "pooling"))
>Error in `[.data.frame`(x, , which) : undefined columns selected
我怀疑在 plm 函数的某个地方,它一定是在你的 data.frame 中添加了一个 "group"。
data("Produc", package = "plm")
form <- log(gsp) ~ log(pc)
Produc$group = Produc$region
pProduc <- pdata.frame(Produc, index = c("group"))
summary(plm(form, data = pProduc, model = "random"))
Error in `[.data.frame`(x, , which) : undefined columns selected
使用我从中复制的 "region" 列,它有效:
pProduc <- pdata.frame(Produc, index = c("region"))
summary(plm(form, data = pProduc, model = "random"))
Oneway (individual) effect Random Effect Model
(Swamy-Arora's transformation)
plm(formula = form, data = pProduc, model = "random")
Unbalanced Panel: n = 9, T = 51-136, N = 816
var share
idiosyncratic 0.03691 0.19213 0.402
individual 0.05502 0.23457 0.598
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.8861 0.9012 0.9192 0.9157 0.9299 0.9299
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.68180 -0.11014 0.00977 -0.00039 0.13815 0.45491
Estimate Std. Error z-value Pr(>|z|)
(Intercept) -1.099088 0.138395 -7.9417 1.994e-15 ***
log(pc) 1.100102 0.010623 103.5627 < 2.2e-16 ***
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Total Sum of Squares: 459.71
Residual Sum of Squares: 30.029
R-Squared: 0.93468
Adj. R-Squared: 0.9346
Chisq: 11647.6 on 1 DF, p-value: < 2.22e-16
对于您的示例,只需重命名列 "group" 并将其设置为避免其他错误的因素。 (对于 "pooling",应将其视为分类而非数字)。
dtable <- merge(dt1 ,dt2, all=TRUE)
dtable <- merge(dtable ,dt3, all=TRUE)
dtable$group = factor(dtable$group)
colnames(dtable)[4] = "GROUP"
dtable_p <- pdata.frame(dtable, index = "GROUP")
summary(plm(sat ~ income, data = dtable_p,method="pooling"))