plot.lm() 如何确定残差图与拟合图的异常值?
How does plot.lm() determine outliers for residual vs fitted plot?
plot.lm() 如何确定残差图与拟合图的异常点(即要标记的点)?我在 documentation 中唯一找到的是:
Details
sub.caption—by default the function call—is shown as a subtitle (under the x-axis title) on each plot when plots are on separate pages, or as a subtitle in the outer margin (if any) when there are multiple plots per page.
The ‘Scale-Location’ plot, also called ‘Spread-Location’ or ‘S-L’ plot, takes the square root of the absolute residuals in order to diminish skewness (sqrt(|E|)) is much less skewed than | E | for Gaussian zero-mean E).
The ‘S-L’, the Q-Q, and the Residual-Leverage plot, use standardized residuals which have identical variance (under the hypothesis). They are given as R[i] / (s * sqrt(1 - h.ii)) where h.ii are the diagonal entries of the hat matrix, influence()$hat (see also hat), and where the Residual-Leverage plot uses standardized Pearson residuals (residuals.glm(type = "pearson")) for R[i].
The Residual-Leverage plot shows contours of equal Cook's distance, for values of cook.levels (by default 0.5 and 1) and omits cases with leverage one with a warning. If the leverages are constant (as is typically the case in a balanced aov situation) the plot uses factor level combinations instead of the leverages for the x-axis. (The factor levels are ordered by mean fitted value.)
In the Cook's distance vs leverage/(1-leverage) plot, contours of standardized residuals that are equal in magnitude are lines through the origin. The contour lines are labelled with the magnitudes.
但它没有说明残差与拟合图的生成方式以及它如何选择要标记的点。
更新:李哲元的回答表明,残差图与拟合图标记点的方式实际上是简单地查看具有最大残差的 3 个点。确实如此。可以用下面的“极端”例子来证明。
x = c(1,2,3,4,5,6)
y = c(2,4,6,8,10,12)
foo = data.frame(x,y)
model = lm(y ~ x, data = foo)
他们定位了最大的 3 个绝对标准化残差。考虑这个例子:
fit <- lm(dist ~ speed, cars)
plot(fit, which = 1)
r <- rstandard(fit) ## get standardised residuals
order(abs(r), decreasing = TRUE)[1:3]
# [1] 49 23 35
这是3个绝对最高拟合值
r <- abs(selectedMod$residuals)
order((r), decreasing = TRUE)[1:3]
plot.lm() 如何确定残差图与拟合图的异常点(即要标记的点)?我在 documentation 中唯一找到的是:
Details
sub.caption—by default the function call—is shown as a subtitle (under the x-axis title) on each plot when plots are on separate pages, or as a subtitle in the outer margin (if any) when there are multiple plots per page.
The ‘Scale-Location’ plot, also called ‘Spread-Location’ or ‘S-L’ plot, takes the square root of the absolute residuals in order to diminish skewness (sqrt(|E|)) is much less skewed than | E | for Gaussian zero-mean E).
The ‘S-L’, the Q-Q, and the Residual-Leverage plot, use standardized residuals which have identical variance (under the hypothesis). They are given as R[i] / (s * sqrt(1 - h.ii)) where h.ii are the diagonal entries of the hat matrix, influence()$hat (see also hat), and where the Residual-Leverage plot uses standardized Pearson residuals (residuals.glm(type = "pearson")) for R[i].
The Residual-Leverage plot shows contours of equal Cook's distance, for values of cook.levels (by default 0.5 and 1) and omits cases with leverage one with a warning. If the leverages are constant (as is typically the case in a balanced aov situation) the plot uses factor level combinations instead of the leverages for the x-axis. (The factor levels are ordered by mean fitted value.)
In the Cook's distance vs leverage/(1-leverage) plot, contours of standardized residuals that are equal in magnitude are lines through the origin. The contour lines are labelled with the magnitudes.
但它没有说明残差与拟合图的生成方式以及它如何选择要标记的点。
更新:李哲元的回答表明,残差图与拟合图标记点的方式实际上是简单地查看具有最大残差的 3 个点。确实如此。可以用下面的“极端”例子来证明。
x = c(1,2,3,4,5,6)
y = c(2,4,6,8,10,12)
foo = data.frame(x,y)
model = lm(y ~ x, data = foo)
他们定位了最大的 3 个绝对标准化残差。考虑这个例子:
fit <- lm(dist ~ speed, cars)
plot(fit, which = 1)
r <- rstandard(fit) ## get standardised residuals
order(abs(r), decreasing = TRUE)[1:3]
# [1] 49 23 35
这是3个绝对最高拟合值
r <- abs(selectedMod$residuals)
order((r), decreasing = TRUE)[1:3]