ggplot2:如何在回归线上绘制小高斯密度曲线?
ggplot2: How to curve small gaussian densities on a regression line?
我想以图形方式显示线性(以及后来的其他类型)回归的假设。我如何在回归线上添加小的高斯密度(或任何类型的密度),如下图所示:
您可以计算沿拟合线的截面的残差经验密度。然后,只需使用 geom_path
在每个间隔中您选择的位置绘制线条即可。要添加理论分布,请在每个部分的残差范围内生成一些密度(此处使用正态密度)。对于下面的正态密度,每个部分的标准差是根据残差的每个部分确定的,但您可以只为所有部分选择一个标准差并使用它。
## Sample data
set.seed(0)
dat <- data.frame(x=(x=runif(100, 0, 50)),
y=rnorm(100, 10*x, 100))
## breaks: where you want to compute densities
breaks <- seq(0, max(dat$x), len=5)
dat$section <- cut(dat$x, breaks)
## Get the residuals
dat$res <- residuals(lm(y ~ x, data=dat))
## Compute densities for each section, and flip the axes, and add means of sections
## Note: the densities need to be scaled in relation to the section size (2000 here)
dens <- do.call(rbind, lapply(split(dat, dat$section), function(x) {
d <- density(x$res, n=50)
res <- data.frame(x=max(x$x)- d$y*2000, y=d$x+mean(x$y))
res <- res[order(res$y), ]
## Get some data for normal lines as well
xs <- seq(min(x$res), max(x$res), len=50)
res <- rbind(res, data.frame(y=xs + mean(x$y),
x=max(x$x) - 2000*dnorm(xs, 0, sd(x$res))))
res$type <- rep(c("empirical", "normal"), each=50)
res
}))
dens$section <- rep(levels(dat$section), each=100)
## Plot both empirical and theoretical
ggplot(dat, aes(x, y)) +
geom_point() +
geom_smooth(method="lm", fill=NA, lwd=2) +
geom_path(data=dens, aes(x, y, group=interaction(section,type), color=type), lwd=1.1) +
theme_bw() +
geom_vline(xintercept=breaks, lty=2)
或者,只是高斯曲线
## Just normal
ggplot(dat, aes(x, y)) +
geom_point() +
geom_smooth(method="lm", fill=NA, lwd=2) +
geom_path(data=dens[dens$type=="normal",], aes(x, y, group=section), color="salmon", lwd=1.1) +
theme_bw() +
geom_vline(xintercept=breaks, lty=2)
我想以图形方式显示线性(以及后来的其他类型)回归的假设。我如何在回归线上添加小的高斯密度(或任何类型的密度),如下图所示:
您可以计算沿拟合线的截面的残差经验密度。然后,只需使用 geom_path
在每个间隔中您选择的位置绘制线条即可。要添加理论分布,请在每个部分的残差范围内生成一些密度(此处使用正态密度)。对于下面的正态密度,每个部分的标准差是根据残差的每个部分确定的,但您可以只为所有部分选择一个标准差并使用它。
## Sample data
set.seed(0)
dat <- data.frame(x=(x=runif(100, 0, 50)),
y=rnorm(100, 10*x, 100))
## breaks: where you want to compute densities
breaks <- seq(0, max(dat$x), len=5)
dat$section <- cut(dat$x, breaks)
## Get the residuals
dat$res <- residuals(lm(y ~ x, data=dat))
## Compute densities for each section, and flip the axes, and add means of sections
## Note: the densities need to be scaled in relation to the section size (2000 here)
dens <- do.call(rbind, lapply(split(dat, dat$section), function(x) {
d <- density(x$res, n=50)
res <- data.frame(x=max(x$x)- d$y*2000, y=d$x+mean(x$y))
res <- res[order(res$y), ]
## Get some data for normal lines as well
xs <- seq(min(x$res), max(x$res), len=50)
res <- rbind(res, data.frame(y=xs + mean(x$y),
x=max(x$x) - 2000*dnorm(xs, 0, sd(x$res))))
res$type <- rep(c("empirical", "normal"), each=50)
res
}))
dens$section <- rep(levels(dat$section), each=100)
## Plot both empirical and theoretical
ggplot(dat, aes(x, y)) +
geom_point() +
geom_smooth(method="lm", fill=NA, lwd=2) +
geom_path(data=dens, aes(x, y, group=interaction(section,type), color=type), lwd=1.1) +
theme_bw() +
geom_vline(xintercept=breaks, lty=2)
或者,只是高斯曲线
## Just normal
ggplot(dat, aes(x, y)) +
geom_point() +
geom_smooth(method="lm", fill=NA, lwd=2) +
geom_path(data=dens[dens$type=="normal",], aes(x, y, group=section), color="salmon", lwd=1.1) +
theme_bw() +
geom_vline(xintercept=breaks, lty=2)