从 lm() 中获取 "mlm" 对象的标准化残差和 "Residual v.s. Fitted" 图

Question

set.seed(0)
## 2 response of 10 observations each
response <- matrix(rnorm(20), 10, 2)
## 3 covariates with 10 observations each
predictors <- matrix(rnorm(30), 10, 3)
fit <- lm(response ~ predictors)

我一直在使用以下方法为整个模型生成残差图：

plot(fitted(fit),residuals(fit))

但是，我想为每个预测协变量绘制单独的图。我可以一次做一个：

f <- fitted(fit)
r <- residual(fit)
plot(f[,1],r[,1])

然而，这种方法的问题在于它需要对具有更多预测变量协变量的数据集进行泛化。有没有一种方法可以在遍历 (f) 和 (r) 的每一列时使用绘图？或者有没有一种方法 plot() 可以按颜色对每个协变量进行分组？

Answer 1

我就是这样解决的。

for(i in 1:ncol(f)) {
    plot(f[,i],r[,i])
}

脑洞大开。

Answer 2

确保您使用的是标准化残差而不是原始残差

我经常看到 plot(fitted(fit), residuals(fit)) 但它在统计上是错误的。我们使用 plot(fit) 生成诊断图，因为我们需要标准化残差而不是原始残差。 阅读 ?plot.lm 了解更多信息。但是 "mlm" 的 plot 方法支持不佳：

plot(fit)
# Error: 'plot.mlm' is not implemented yet

为"mlm"

定义"rstandard" S3方法不支持

plot.mlm 的原因有很多，其中之一就是缺少 rstandard.mlm。对于"lm"和"glm"class，有一个通用的S3方法"rstandard"来获取标准化残差：

methods(rstandard)
# [1] rstandard.glm* rstandard.lm*

不支持"mlm"。所以我们先填补这个空白。

得到标准化残差并不难。令 hii 为帽子矩阵的对角线，残差的逐点估计标准误差为 sqrt(1 - hii) * sigma，其中 sigma = sqrt(RSS / df.residual) 为估计的残差标准误差。 RSS为残差平方和； df.residual为剩余自由度

hii 可以从模型矩阵的 QR 分解的矩阵因子 Q 计算：rowSums(Q ^ 2)。对于 "mlm"，只有一个 QR 分解，因为模型矩阵对于所有响应都是相同的，因此我们只需要计算一次 hii。

不同的反应有不同的sigma，但都是优雅的colSums(residuals(fit) ^ 2) / df.residual(fit)。

现在，让我们总结一下这些想法，为 "mlm" 获得我们自己的 "rstandard" 方法：

## define our own "rstandard" method for "mlm" class
rstandard.mlm <- function (model) {
  Q <- with(model, qr.qy(qr, diag(1, nrow = nrow(qr$qr), ncol = qr$rank)))  ## Q matrix
  hii <- rowSums(Q ^ 2)  ## diagonal of hat matrix QQ'
  RSS <- colSums(model$residuals ^ 2)  ## residual sums of squares (for each model)
  sigma <- sqrt(RSS / model$df.residual)  ##  ## Pearson estimate of residuals (for each model)
  pointwise_sd <- outer(sqrt(1 - hii), sigma)  ## point-wise residual standard error (for each model)
  model$residuals / pointwise_sd  ## standardised residuals
  }

注意在函数名中使用 .mlm 来告诉 R 这是关联的 S3 方法。一旦我们定义了这个函数，我们就可以在"rstandard"方法中看到它：

## now there are method for "mlm"
methods(rstandard)
# [1] rstandard.glm* rstandard.lm*  rstandard.mlm

要调用这个函数，我们不必显式调用rstandard.mlm；调用 rstandard 就足够了：

## test with your fitted model `fit`
rstandard(fit)
#          [,1]       [,2]
#1   1.56221865  2.6593505
#2  -0.98791320 -1.9344546
#3   0.06042529 -0.4858276
#4   0.18713629  2.9814135
#5   0.11277397  1.4336484
#6  -0.74289985 -2.4452868
#7   0.03690363  0.7015916
#8  -1.58940448 -1.2850961
#9   0.38504435  1.3907223
#10  1.34618139 -1.5900891

标准化残差 N(0, 1) 分布。

获取残差v.s。 "mlm"

的拟合图

您的初始尝试：

f <- fitted(fit); r <- rstandard(fit); plot(f, r)

这不是个坏主意，前提是不同模型的点可以相互识别。所以我们可以尝试为不同的模型使用不同的点颜色：

plot(f, r, col = as.numeric(col(f)), pch = 19)

col、pch 和 cex 等图形参数可以采用向量输入。我要求 plot 对 r[,j] ~ f[,j] 使用 col = j，其中 j = 1, 2,..., ncol(f)。阅读 ?par 的 "Color Specification" 了解 col = j 的含义。 pch = 19 告诉 plot 绘制实心点。阅读 basic graphcial parameters 了解各种选择。

最后你可能想要一个图例。你可以做到

plot(f, r, col = as.numeric(col(f)), pch = 19, ylim = c(-3, 4))
legend("topleft", legend = paste0("response ", 1:ncol(f)), pch = 19,
       col = 1:ncol(f), text.col = 1:ncol(f))

为了给图例框留下 space，我们稍微扩展了 ylim。由于标准化残差是 N(0,1)，ylim = c(-3, 3) 是一个很好的范围。如果我们想将图例框放在左上角，我们将 ylim 扩展为 c(-3, 4)。您可以通过 ncol、title 等进一步自定义您的图例

你有多少条回复？

如果您的回复不超过几个，上述建议很有效。如果你有很多，建议将它们绘制在单独的图中。您发现 for 循环是不错的，除了您需要将绘图区域拆分为不同的子图，可能使用 par(mfrow = c(?, ?))。如果采用这种方法，还可以设置内边距 mar 和外边距 oma。您可以阅读以获取执行此操作的一个示例。

如果您有更多回复，您可能想要两者兼而有之？假设你有 42 个响应，你可以做 par(mfrow = c(2, 3))，然后在每个子图中绘制 7 个响应。现在解决方案更多地基于意见。

从 lm() 中获取 "mlm" 对象的标准化残差和 "Residual v.s. Fitted" 图

Obtain standardised residuals and "Residual v.s. Fitted" plot for "mlm" object from `lm()`

regression

r

linear-regression

lm

mlm