R:GLM 模型和 optim() 包之间估计参数的差异
R: The diferrence of estimate parameter between GLM model and optim() package
我的目的是在 R 中使用 optim() 包找到估计参数。
我将我的结果与 R 中的 GLM 模型进行比较。代码是
d <- read.delim("http://dnett.github.io/S510/Disease.txt")
d$disease=factor(d$disease)
d$ses=factor(d$ses)
d$sector=factor(d$sector)
str(d)
oreduced <- glm(disease~age+sector, family=binomial(link=logit), data=d)
summary(oreduced)
y<-as.numeric(as.character(d$disease))
x1<-as.numeric(as.character(d$age))
x2<-as.numeric(as.character(d$sector))
nlldbin=function(param){
eta<-param[1]+param[2]*x1+param[3]*x2
p<-1/(1+exp(-eta))
-sum(y*log(p)+(1-y)*log(1-p),na.rm=TRUE)
}
MLE_estimates<-optim(c(Intercept=0.1,age=0.1,sector2=0.1),nlldbin,hessian=TRUE)
MLE_estimatesenter
Wih GlM 结果是
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.15966 0.34388 -6.280 3.38e-10 ***
age 0.02681 0.00865 3.100 0.001936 **
sector2 1.18169 0.33696 3.507 0.000453 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
并使用 optim()
$par
Intercept age sector2
-3.34005918 0.02680405 1.18101449
谁能告诉我为什么不一样?以及如何解决这个问题?谢谢
优化版本的x2编码错误。尝试:
nlldbin <- function(param) {
eta <- param[1] + param[2] * x1 + param[3] * (x2 == 2)
p <- 1 / (1 + exp(-eta))
- sum(y * log(p) + (1-y) * log(1-p), na.rm = TRUE)
}
st <- c(Intercept = 0.1, age = 0.1, sector2 = 0.1)
MLE_estimates <- optim(st, nlldbin, hessian = TRUE)
MLE_estimates$par
## Intercept age sector2
## -2.15932867 0.02680381 1.18158898
coef(oreduced)
## (Intercept) age sector2
## -2.15965912 0.02681289 1.18169345
注意如果你愿意使用dbinom和plogis,nlldbin中的最后两行可以这样写:
-sum(dbinom(y, 1, plogis(eta), log = TRUE))
我的目的是在 R 中使用 optim() 包找到估计参数。 我将我的结果与 R 中的 GLM 模型进行比较。代码是
d <- read.delim("http://dnett.github.io/S510/Disease.txt")
d$disease=factor(d$disease)
d$ses=factor(d$ses)
d$sector=factor(d$sector)
str(d)
oreduced <- glm(disease~age+sector, family=binomial(link=logit), data=d)
summary(oreduced)
y<-as.numeric(as.character(d$disease))
x1<-as.numeric(as.character(d$age))
x2<-as.numeric(as.character(d$sector))
nlldbin=function(param){
eta<-param[1]+param[2]*x1+param[3]*x2
p<-1/(1+exp(-eta))
-sum(y*log(p)+(1-y)*log(1-p),na.rm=TRUE)
}
MLE_estimates<-optim(c(Intercept=0.1,age=0.1,sector2=0.1),nlldbin,hessian=TRUE)
MLE_estimatesenter
Wih GlM 结果是
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.15966 0.34388 -6.280 3.38e-10 ***
age 0.02681 0.00865 3.100 0.001936 **
sector2 1.18169 0.33696 3.507 0.000453 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
并使用 optim()
$par
Intercept age sector2
-3.34005918 0.02680405 1.18101449
谁能告诉我为什么不一样?以及如何解决这个问题?谢谢
优化版本的x2编码错误。尝试:
nlldbin <- function(param) {
eta <- param[1] + param[2] * x1 + param[3] * (x2 == 2)
p <- 1 / (1 + exp(-eta))
- sum(y * log(p) + (1-y) * log(1-p), na.rm = TRUE)
}
st <- c(Intercept = 0.1, age = 0.1, sector2 = 0.1)
MLE_estimates <- optim(st, nlldbin, hessian = TRUE)
MLE_estimates$par
## Intercept age sector2
## -2.15932867 0.02680381 1.18158898
coef(oreduced)
## (Intercept) age sector2
## -2.15965912 0.02681289 1.18169345
注意如果你愿意使用dbinom和plogis,nlldbin中的最后两行可以这样写:
-sum(dbinom(y, 1, plogis(eta), log = TRUE))