plm() 函数中选择的未定义列
undefined columns selected in plm() function
我在 plm() 函数中遇到了一个奇怪的问题。下面是代码:
library(data.table)
library(tidyverse)
library(plm)
#Data Generation
n <- 500
set.seed(75080)
z <- rnorm(n)
w <- rnorm(n)
x <- 5*z + 50
y <- -100*z+ 1100 + 50*w
y <- 10*round(y/10)
y <- ifelse(y<200,200,y)
y <- ifelse(y>1600,1600,y)
dt1 <- data.table('id'=1:500,'sat'=y,'income'=x,'group'=rep(1,n))
z <- rnorm(n)
w <- rnorm(n)
x <- 5*z + 80
y <- -80*z+ 1200 + 50*w
y <- 10*round(y/10)
y <- ifelse(y<200,200,y)
y <- ifelse(y>1600,1600,y)
dt2 <- data.table('id'=501:1000,'sat'=y,'income'=x,'group'=rep(2,n))
z <- rnorm(n)
w <- rnorm(n)
x <- 5*z + 30
y <- -120*z+ 1000 + 50*w
y <- 10*round(y/10)
y <- ifelse(y<200,200,y)
y <- ifelse(y>1600,1600,y)
dt3 <- data.table('id'=1001:1500,'sat'=y,'income'=x,'group'=rep(3,n))
dtable <- merge(dt1 ,dt2, all=TRUE)
dtable <- merge(dtable ,dt3, all=TRUE)
# Model
dtable_p <- pdata.frame(dtable, index = "group")
mod_1 <- plm(sat ~ income, data = dtable_p,model = "pooling")
Error in [.data.frame
(x, , which) : undefined columns selected
我检查了所有的可能性,但我不明白为什么它会给我一个错误。列的名称是正确的,为什么 R 说未定义的列???谢谢!
跟进:我添加另一个数据集测试作为@StupidWolf 用来证明
data("Produc", package = "plm")
form <- log(gsp) ~ log(pc)
Produc$group <- Produc$region
pProduc <- pdata.frame(Produc, index = "group")
Produc$group <- rep(1:48, each = 17)
summary(plm(form, data = pProduc, model = "pooling"))
>Error in `[.data.frame`(x, , which) : undefined columns selected
这太奇怪了,答案是索引不能命名为"group"。
我怀疑在 plm 函数的某个地方,它一定是在你的 data.frame 中添加了一个 "group"。
我们可以使用示例数据集
data("Produc", package = "plm")
form <- log(gsp) ~ log(pc)
Produc$group = Produc$region
pProduc <- pdata.frame(Produc, index = c("group"))
summary(plm(form, data = pProduc, model = "random"))
Error in `[.data.frame`(x, , which) : undefined columns selected
使用我从中复制的 "region" 列,它有效:
pProduc <- pdata.frame(Produc, index = c("region"))
summary(plm(form, data = pProduc, model = "random"))
Oneway (individual) effect Random Effect Model
(Swamy-Arora's transformation)
Call:
plm(formula = form, data = pProduc, model = "random")
Unbalanced Panel: n = 9, T = 51-136, N = 816
Effects:
var std.dev share
idiosyncratic 0.03691 0.19213 0.402
individual 0.05502 0.23457 0.598
theta:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.8861 0.9012 0.9192 0.9157 0.9299 0.9299
Residuals:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.68180 -0.11014 0.00977 -0.00039 0.13815 0.45491
Coefficients:
Estimate Std. Error z-value Pr(>|z|)
(Intercept) -1.099088 0.138395 -7.9417 1.994e-15 ***
log(pc) 1.100102 0.010623 103.5627 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Total Sum of Squares: 459.71
Residual Sum of Squares: 30.029
R-Squared: 0.93468
Adj. R-Squared: 0.9346
Chisq: 11647.6 on 1 DF, p-value: < 2.22e-16
对于您的示例,只需重命名列 "group" 并将其设置为避免其他错误的因素。 (对于 "pooling",应将其视为分类而非数字)。
dtable <- merge(dt1 ,dt2, all=TRUE)
dtable <- merge(dtable ,dt3, all=TRUE)
dtable$group = factor(dtable$group)
colnames(dtable)[4] = "GROUP"
dtable_p <- pdata.frame(dtable, index = "GROUP")
summary(plm(sat ~ income, data = dtable_p,method="pooling"))
我在 plm() 函数中遇到了一个奇怪的问题。下面是代码:
library(data.table)
library(tidyverse)
library(plm)
#Data Generation
n <- 500
set.seed(75080)
z <- rnorm(n)
w <- rnorm(n)
x <- 5*z + 50
y <- -100*z+ 1100 + 50*w
y <- 10*round(y/10)
y <- ifelse(y<200,200,y)
y <- ifelse(y>1600,1600,y)
dt1 <- data.table('id'=1:500,'sat'=y,'income'=x,'group'=rep(1,n))
z <- rnorm(n)
w <- rnorm(n)
x <- 5*z + 80
y <- -80*z+ 1200 + 50*w
y <- 10*round(y/10)
y <- ifelse(y<200,200,y)
y <- ifelse(y>1600,1600,y)
dt2 <- data.table('id'=501:1000,'sat'=y,'income'=x,'group'=rep(2,n))
z <- rnorm(n)
w <- rnorm(n)
x <- 5*z + 30
y <- -120*z+ 1000 + 50*w
y <- 10*round(y/10)
y <- ifelse(y<200,200,y)
y <- ifelse(y>1600,1600,y)
dt3 <- data.table('id'=1001:1500,'sat'=y,'income'=x,'group'=rep(3,n))
dtable <- merge(dt1 ,dt2, all=TRUE)
dtable <- merge(dtable ,dt3, all=TRUE)
# Model
dtable_p <- pdata.frame(dtable, index = "group")
mod_1 <- plm(sat ~ income, data = dtable_p,model = "pooling")
Error in
[.data.frame
(x, , which) : undefined columns selected
我检查了所有的可能性,但我不明白为什么它会给我一个错误。列的名称是正确的,为什么 R 说未定义的列???谢谢!
跟进:我添加另一个数据集测试作为@StupidWolf 用来证明
data("Produc", package = "plm")
form <- log(gsp) ~ log(pc)
Produc$group <- Produc$region
pProduc <- pdata.frame(Produc, index = "group")
Produc$group <- rep(1:48, each = 17)
summary(plm(form, data = pProduc, model = "pooling"))
>Error in `[.data.frame`(x, , which) : undefined columns selected
这太奇怪了,答案是索引不能命名为"group"。
我怀疑在 plm 函数的某个地方,它一定是在你的 data.frame 中添加了一个 "group"。
我们可以使用示例数据集
data("Produc", package = "plm")
form <- log(gsp) ~ log(pc)
Produc$group = Produc$region
pProduc <- pdata.frame(Produc, index = c("group"))
summary(plm(form, data = pProduc, model = "random"))
Error in `[.data.frame`(x, , which) : undefined columns selected
使用我从中复制的 "region" 列,它有效:
pProduc <- pdata.frame(Produc, index = c("region"))
summary(plm(form, data = pProduc, model = "random"))
Oneway (individual) effect Random Effect Model
(Swamy-Arora's transformation)
Call:
plm(formula = form, data = pProduc, model = "random")
Unbalanced Panel: n = 9, T = 51-136, N = 816
Effects:
var std.dev share
idiosyncratic 0.03691 0.19213 0.402
individual 0.05502 0.23457 0.598
theta:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.8861 0.9012 0.9192 0.9157 0.9299 0.9299
Residuals:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.68180 -0.11014 0.00977 -0.00039 0.13815 0.45491
Coefficients:
Estimate Std. Error z-value Pr(>|z|)
(Intercept) -1.099088 0.138395 -7.9417 1.994e-15 ***
log(pc) 1.100102 0.010623 103.5627 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Total Sum of Squares: 459.71
Residual Sum of Squares: 30.029
R-Squared: 0.93468
Adj. R-Squared: 0.9346
Chisq: 11647.6 on 1 DF, p-value: < 2.22e-16
对于您的示例,只需重命名列 "group" 并将其设置为避免其他错误的因素。 (对于 "pooling",应将其视为分类而非数字)。
dtable <- merge(dt1 ,dt2, all=TRUE)
dtable <- merge(dtable ,dt3, all=TRUE)
dtable$group = factor(dtable$group)
colnames(dtable)[4] = "GROUP"
dtable_p <- pdata.frame(dtable, index = "GROUP")
summary(plm(sat ~ income, data = dtable_p,method="pooling"))