R 中的偏移规范

Question

阅读 R 中 glm 的描述，我不清楚在公式中指定模型偏移量或使用偏移量参数之间的区别。

在我的模型中，我有一个响应 y，它应该除以偏移项 w，为简单起见，假设我们有协变量 x。我使用日志 link.

有什么区别

glm(log(y)~x+offset(-log(w)))

和

glm(log(y)~x,offset=-log(w))

Answer 1

两种方式完全相同

这个可以看文档（粗体部分）：

this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector of length equal to the number of cases. One or more offset terms can be included in the formula instead or as well, and if more than one is specified their sum is used. See model.offset.

上面讲到glm函数中的offset参数，说可以在公式中代替。

下面的一个简单示例表明以上是正确的：

数据

y <- sample(1:2, 50, rep=TRUE)
x <- runif(50)
w <- 1:50
df <- data.frame(y,x)

第一个模型：

> glm(log(y)~x+offset(-log(w)))

Call:  glm(formula = log(y) ~ x + offset(-log(w)))

Coefficients:
(Intercept)            x  
     3.6272      -0.4152  

Degrees of Freedom: 49 Total (i.e. Null);  48 Residual
Null Deviance:      44.52 
Residual Deviance: 43.69    AIC: 141.2

第二种方式：

> glm(log(y)~x,offset=-log(w))

Call:  glm(formula = log(y) ~ x, offset = -log(w))

Coefficients:
(Intercept)            x  
     3.6272      -0.4152  

Degrees of Freedom: 49 Total (i.e. Null);  48 Residual
Null Deviance:      44.52 
Residual Deviance: 43.69    AIC: 141.2

如您所见，两者完全相同。

Answer 2

我只是想补充一点，当您在公式 glm(log(y)~x+offset(-log(w))) 中使用偏移量并以这种方式制作模型时，如果您以后想要预测数据，它将考虑 w（本例中的偏移量），如果您在 offset 参数中包含偏移量，则预测将不会考虑偏移量。

R 中的偏移规范

Offset specification in R

r

offset

glm