ggplot2 的 geom_smooth() 是否显示逐点置信带或同时置信带?
Does geom_smooth() of ggplot2 show pointwise confidence bands, or simultaneous confidence bands?
我不确定这个问题在这里更合适还是在 Cross Validated 上更合适。我希望我做出了正确的选择。
考虑示例:
library(dplyr)
setosa <- iris %>% filter(Species == "setosa") %>% select(Sepal.Length, Sepal.Width, Species)
library(ggplot2)
ggplot(data = setosa, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_point() +
geom_smooth(method ="lm", formula = y ~ poly(x,2))
默认情况下,ggplot
"displays confidence interval around smooth"(参见 here), given by the gray area around the regression curve. I've always assumed these are simultaneous confidence bands for the regression curve, not pointwise confidence bands. ggplot2
documentation refers to the predict
function for details on how the standard errors are computed. However, reading the doc for predict.lm,它没有明确说明同时计算置信带。那么,这里的正确解释是什么?
检查 predict.lm()
计算结果的一种方法是检查代码(predict
将标准误差乘以 qt((1 - level)/2, df)
,因此似乎没有针对同步推理进行调整)。另一种方法是构建同时置信区间并将它们与 predict
的区间进行比较。
拟合模型并构建同时置信区间:
setosa <- subset(iris, Species == "setosa")
setosa <- setosa[order(setosa$Sepal.Length), ]
fit <- lm(Sepal.Width ~ poly(Sepal.Length, 2), setosa)
K <- cbind(1, poly(setosa$Sepal.Length, 2))
cht <- multcomp::glht(fit, linfct = K)
cci <- confint(cht)
重塑和绘图:
cc <- as.data.frame(cci$confint)
cc$Sepal.Length <- setosa$Sepal.Length
cc <- reshape2::melt(cc[, 2:4], id.var = "Sepal.Length")
library(ggplot2)
ggplot(data = setosa, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_point() +
geom_smooth(method ="lm", formula = y ~ poly(x,2)) +
geom_line(data = cc,
aes(x = Sepal.Length, y = value, group = variable),
colour = "red")
看来 predict(.., interval = "confidence")
不会产生同时置信区间:
我不确定这个问题在这里更合适还是在 Cross Validated 上更合适。我希望我做出了正确的选择。
考虑示例:
library(dplyr)
setosa <- iris %>% filter(Species == "setosa") %>% select(Sepal.Length, Sepal.Width, Species)
library(ggplot2)
ggplot(data = setosa, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_point() +
geom_smooth(method ="lm", formula = y ~ poly(x,2))
ggplot
"displays confidence interval around smooth"(参见 here), given by the gray area around the regression curve. I've always assumed these are simultaneous confidence bands for the regression curve, not pointwise confidence bands. ggplot2
documentation refers to the predict
function for details on how the standard errors are computed. However, reading the doc for predict.lm,它没有明确说明同时计算置信带。那么,这里的正确解释是什么?
检查 predict.lm()
计算结果的一种方法是检查代码(predict
将标准误差乘以 qt((1 - level)/2, df)
,因此似乎没有针对同步推理进行调整)。另一种方法是构建同时置信区间并将它们与 predict
的区间进行比较。
拟合模型并构建同时置信区间:
setosa <- subset(iris, Species == "setosa")
setosa <- setosa[order(setosa$Sepal.Length), ]
fit <- lm(Sepal.Width ~ poly(Sepal.Length, 2), setosa)
K <- cbind(1, poly(setosa$Sepal.Length, 2))
cht <- multcomp::glht(fit, linfct = K)
cci <- confint(cht)
重塑和绘图:
cc <- as.data.frame(cci$confint)
cc$Sepal.Length <- setosa$Sepal.Length
cc <- reshape2::melt(cc[, 2:4], id.var = "Sepal.Length")
library(ggplot2)
ggplot(data = setosa, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_point() +
geom_smooth(method ="lm", formula = y ~ poly(x,2)) +
geom_line(data = cc,
aes(x = Sepal.Length, y = value, group = variable),
colour = "red")
看来 predict(.., interval = "confidence")
不会产生同时置信区间: