手动编码的泊松对数似然函数 returns 与交互式模型的 glm 不同的结果
Manually coded Poisson log likelihood function returns a different result from glm for interactive models
我已经编写了自己的泊松似然函数,但它返回的值与具有特定数据交互作用的模型的 glm 有很大不同。请注意,该函数从我尝试过的所有其他数据以及没有与此数据交互的模型中吐出与 glm 完全相同的结果。
> # Log likelihood function
> llpoi = function(X, y){
+ # Ensures X is a matrix
+ if(class(X) != "matrix") X = as.matrix(X)
+ # Ensures there's a constant
+ if(sum(X[, 1]) != nrow(X)) X = cbind(1, X)
+ # A useful scalar that I'll need below
+ k = ncol(X)
+ ## Function to be maximized
+ FUN = function(par, X, y){
+ # beta hat -- the parameter we're trying to estimate
+ betahat = par[1:k]
+ # mu hat -- the systematic component
+ muhat = X %*% betahat
+ # Log likelihood function
+ sum(muhat * y - exp(muhat))
+ }
+ # Optimizing
+ opt = optim(rep(0, k), fn = FUN, y = y, X = X, control = list(fnscale = -1), method = "BFGS", hessian = T)
+ # Results, including getting the SEs from the hessian
+ cbind(opt$par, sqrt(diag(solve(-1 * opt$hessian))))
+ }
>
> # Defining inputs
> y = c(2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 1, 1, 3, 1, 1, 3, 2, 2, 2, 3, 1, 2, 4, 3, 3, 3, 1, 3, 0, 2, 1, 2, 4, 1, 2, 0, 2, 1, 2, 1, 4, 1, 2, 0)
> x1 = c(8, 1, 0, 3, 3, 3, 5, 4, 0.4, 1.5, 2, 1, 1, 7, 2, 3, 0, 2, 1.5, 5, 1, 4, 5.5, 6, 3, 3, 2, 0.5, 5, 10, 3, 22, 20, 3, 20, 10, 15, 25, 15, 6, 3.5, 5, 18, 2, 15.0, 16, 24)
> x2 = c(12, 12, 12, 16, 12, 12, 12, 12, 12, 12, 12, 12, 9, 9, 12, 9, 12, 12, 9, 16, 9, 6, 12, 9, 9, 12, 12, 12, 12, 14, 14, 14, 9, 12, 9, 12, 3, 12, 9, 6, 12, 12, 12, 12, 12, 12, 9)
>
> # Results
> withmyfun = llpoi(cbind(x1, x2, x1 * x2), y)
> round(withmyfun, 2)
[,1] [,2]
[1,] 0.96 0.90
[2,] -0.05 0.09
[3,] -0.02 0.08
[4,] 0.00 0.01
> withglm = glm(y ~ x1 + x2 + x1 * x2, family = "poisson")
> round(summary(withglm)$coef[, 1:2], 2)
Estimate Std. Error
(Intercept) 1.08 0.90
x1 -0.07 0.09
x2 -0.03 0.08
x1:x2 0.00 0.01
这是特定数据吗?它是固有的
优化过程,最终会与 glm 产生更大的差异,而我只是对这些数据感到不走运?是使用method = "BFGS" for optim的函数吗?
通过重新缩放右侧变量,结果有了很大改善。
> library(data.table)
> setDT(tmp)
> tmp[, x1 := scale(x1)][, x2 := scale(x2)]
>
>
> withmyfun = with(tmp, llpoi(cbind(x1, x2, x1 * x2), y))
> withmyfun
[,1] [,2]
[1,] 0.57076392 0.1124637
[2,] -0.19620040 0.1278070
[3,] -0.01509032 0.1169019
[4,] 0.05636459 0.1380611
>
> withglm = glm(y ~ x1 + x2 + x1 * x2, family = "poisson", data = tmp)
> summary(withglm)$coef[, 1:2]
Estimate Std. Error
(Intercept) 0.57075132 0.1124641
x1 -0.19618199 0.1278061
x2 -0.01507467 0.1169034
x1:x2 0.05636934 0.1380621
>
因此,我的建议是,在 llpoi
内部,有一个程序在使用 optim
数据之前对变量进行归一化,并根据函数 returns 重新调整估计值价值。您的示例数据范围太大,导致系数估计值非常小。由于微不足道的变量导致相对平坦的似然面,这个问题变得更糟。
注:
除了拦截之外,您可以从中获得非常接近的输出。我所说的标准化的意思就是这样。
llpoi = function(X, y){
# Ensures X is a matrix
if(class(X) != "matrix") X = as.matrix(X)
# Ensures there's a constant
if(sum(X[, 1]) != nrow(X)) X = cbind(1, X)
# A useful scalar that I'll need below
avgs <- c(0, apply(X[, 2:ncol(X)], 2, mean))
sds <- c(1, apply(X[, 2:ncol(X)], 2, sd))
X<- t((t(X) - avgs)/sds)
k = ncol(X)
## Function to be maximized
FUN = function(par, X, y){
# beta hat -- the parameter we're trying to estimate
betahat = par[1:k]
# mu hat -- the systematic component
muhat = X %*% betahat
# Log likelihood function
sum(muhat * y - exp(muhat))
}
# Optimizing
opt = optim(rep(0, k), fn = FUN, y = y, X = X, control = list(fnscale = -1), method = "BFGS", hessian = T)
# Results, including getting the SEs from the hessian
cbind(opt$par, sqrt(diag(solve(-1 * opt$hessian))))/sds
}
经过大量研究,我了解到这两个结果不同,因为 glm.fit,glm 背后的主力通过 Newton-Raphson 方法优化函数,而我在 llpoi 函数中使用 BFGS。 BFGS 速度更快,但精度较低。在大多数情况下,这两个结果将非常相似,但当表面区域太平坦或最大值太多时,可能会有更大的差异,正如 amatsuo_net 正确指出的那样,因为 BFGS 使用的爬升算法会卡住.
我已经编写了自己的泊松似然函数,但它返回的值与具有特定数据交互作用的模型的 glm 有很大不同。请注意,该函数从我尝试过的所有其他数据以及没有与此数据交互的模型中吐出与 glm 完全相同的结果。
> # Log likelihood function
> llpoi = function(X, y){
+ # Ensures X is a matrix
+ if(class(X) != "matrix") X = as.matrix(X)
+ # Ensures there's a constant
+ if(sum(X[, 1]) != nrow(X)) X = cbind(1, X)
+ # A useful scalar that I'll need below
+ k = ncol(X)
+ ## Function to be maximized
+ FUN = function(par, X, y){
+ # beta hat -- the parameter we're trying to estimate
+ betahat = par[1:k]
+ # mu hat -- the systematic component
+ muhat = X %*% betahat
+ # Log likelihood function
+ sum(muhat * y - exp(muhat))
+ }
+ # Optimizing
+ opt = optim(rep(0, k), fn = FUN, y = y, X = X, control = list(fnscale = -1), method = "BFGS", hessian = T)
+ # Results, including getting the SEs from the hessian
+ cbind(opt$par, sqrt(diag(solve(-1 * opt$hessian))))
+ }
>
> # Defining inputs
> y = c(2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 1, 1, 3, 1, 1, 3, 2, 2, 2, 3, 1, 2, 4, 3, 3, 3, 1, 3, 0, 2, 1, 2, 4, 1, 2, 0, 2, 1, 2, 1, 4, 1, 2, 0)
> x1 = c(8, 1, 0, 3, 3, 3, 5, 4, 0.4, 1.5, 2, 1, 1, 7, 2, 3, 0, 2, 1.5, 5, 1, 4, 5.5, 6, 3, 3, 2, 0.5, 5, 10, 3, 22, 20, 3, 20, 10, 15, 25, 15, 6, 3.5, 5, 18, 2, 15.0, 16, 24)
> x2 = c(12, 12, 12, 16, 12, 12, 12, 12, 12, 12, 12, 12, 9, 9, 12, 9, 12, 12, 9, 16, 9, 6, 12, 9, 9, 12, 12, 12, 12, 14, 14, 14, 9, 12, 9, 12, 3, 12, 9, 6, 12, 12, 12, 12, 12, 12, 9)
>
> # Results
> withmyfun = llpoi(cbind(x1, x2, x1 * x2), y)
> round(withmyfun, 2)
[,1] [,2]
[1,] 0.96 0.90
[2,] -0.05 0.09
[3,] -0.02 0.08
[4,] 0.00 0.01
> withglm = glm(y ~ x1 + x2 + x1 * x2, family = "poisson")
> round(summary(withglm)$coef[, 1:2], 2)
Estimate Std. Error
(Intercept) 1.08 0.90
x1 -0.07 0.09
x2 -0.03 0.08
x1:x2 0.00 0.01
这是特定数据吗?它是固有的 优化过程,最终会与 glm 产生更大的差异,而我只是对这些数据感到不走运?是使用method = "BFGS" for optim的函数吗?
通过重新缩放右侧变量,结果有了很大改善。
> library(data.table)
> setDT(tmp)
> tmp[, x1 := scale(x1)][, x2 := scale(x2)]
>
>
> withmyfun = with(tmp, llpoi(cbind(x1, x2, x1 * x2), y))
> withmyfun
[,1] [,2]
[1,] 0.57076392 0.1124637
[2,] -0.19620040 0.1278070
[3,] -0.01509032 0.1169019
[4,] 0.05636459 0.1380611
>
> withglm = glm(y ~ x1 + x2 + x1 * x2, family = "poisson", data = tmp)
> summary(withglm)$coef[, 1:2]
Estimate Std. Error
(Intercept) 0.57075132 0.1124641
x1 -0.19618199 0.1278061
x2 -0.01507467 0.1169034
x1:x2 0.05636934 0.1380621
>
因此,我的建议是,在 llpoi
内部,有一个程序在使用 optim
数据之前对变量进行归一化,并根据函数 returns 重新调整估计值价值。您的示例数据范围太大,导致系数估计值非常小。由于微不足道的变量导致相对平坦的似然面,这个问题变得更糟。
注:
除了拦截之外,您可以从中获得非常接近的输出。我所说的标准化的意思就是这样。
llpoi = function(X, y){
# Ensures X is a matrix
if(class(X) != "matrix") X = as.matrix(X)
# Ensures there's a constant
if(sum(X[, 1]) != nrow(X)) X = cbind(1, X)
# A useful scalar that I'll need below
avgs <- c(0, apply(X[, 2:ncol(X)], 2, mean))
sds <- c(1, apply(X[, 2:ncol(X)], 2, sd))
X<- t((t(X) - avgs)/sds)
k = ncol(X)
## Function to be maximized
FUN = function(par, X, y){
# beta hat -- the parameter we're trying to estimate
betahat = par[1:k]
# mu hat -- the systematic component
muhat = X %*% betahat
# Log likelihood function
sum(muhat * y - exp(muhat))
}
# Optimizing
opt = optim(rep(0, k), fn = FUN, y = y, X = X, control = list(fnscale = -1), method = "BFGS", hessian = T)
# Results, including getting the SEs from the hessian
cbind(opt$par, sqrt(diag(solve(-1 * opt$hessian))))/sds
}
经过大量研究,我了解到这两个结果不同,因为 glm.fit,glm 背后的主力通过 Newton-Raphson 方法优化函数,而我在 llpoi 函数中使用 BFGS。 BFGS 速度更快,但精度较低。在大多数情况下,这两个结果将非常相似,但当表面区域太平坦或最大值太多时,可能会有更大的差异,正如 amatsuo_net 正确指出的那样,因为 BFGS 使用的爬升算法会卡住.