R 中的阴影置信区间 - 如果可能,以 R 为基数
Shading confidence intervals in R - base R if possible
我正在比较使用 LOESS 回归的两条线。我想清楚地显示两条线的置信区间,但遇到了一些困难。
我尝试过使用各种线型和颜色,但在我看来,结果仍然太乱。我认为置信区间之间的阴影可能会让事情变得更清楚,但考虑到我的编码到目前为止的结构,我在解决这个问题时遇到了一些困难。我已经包含了生成的图、两组 Analysis5k 和 Analysis5kz 的数据,以及到目前为止我的代码。
我见过一些示例,其中两个多边形重叠以显示置信区间重叠的位置,这似乎是呈现数据的好方法。如果有一种方法可以在两个置信区间共享的区域中绘制多边形,那可能是呈现数据的另一种好方法。
我了解应该如何完成多边形的基本概念,但我发现的示例已应用于更简单的线条和数据。到目前为止,部分原因是我自己对一些糟糕的组织有过错,但由于这一步基本上是我数据展示的画龙点睛之笔,我真的不想从头开始重新做所有事情。
非常感谢任何帮助或见解。
更新
我更新了标题。我收到了一些使用 ggplot 的很好的例子,虽然我想在未来开始使用 ggplot,但到目前为止我只处理过 base R。对于这个特定的项目,如果可能的话,我想尽量将其保留在 base R 中。
分析5k
Period 15p5 Total_5plus
-4350 0.100529101 12.6
-3900 0.4 20
-3650 0.0625 9.6
-3900 0.126984127 16.8
-3958 0.133333333 5
-4350 0.150943396 10.6
-3400 0.146341463 8.2
-3650 0.255319149 9.4
-3400 0.222222222 9
-3500 0.245014245 39
-3600 0.125 8
-3808 0.1 20
-3900 0.160493827 18
-3958 0.238095238 7
-4058 0.2 5
-3500 0.086956522 28.75
-4117 0.141414141 6.6
-4350 0.171038825 31.76666667
-4350 0.166666667 6
-3650 0.143798024 30.36666667
-2715 0.137931034 7.25
-4350 0.235588972 26.6
-3500 0.228840125 79.75
-4350 0.041666667 8
-3650 0.174757282 20.6
-2715 0.377777778 11.25
-3500 0.2 7.5
-3650 0.078947368 7.6
-3400 0.208333333 24
-4233 0.184027778 19.2
-3650 0.285714286 12.6
-4350 0.166666667 6
分析5kz
Period 15p5 Total_5plus
-4350 0.100529101 12.6
-4350 0 5
-3900 0.4 20
-3650 0.0625 9.6
-3400 0 6
-3900 0.126984127 16.8
-3958 0.133333333 5
-4350 0.150943396 10.6
-3400 0.146341463 8.2
-3650 0.255319149 9.4
-3400 0.222222222 9
-3500 0.245014245 39
-3600 0.125 8
-3650 0 28
-3808 0.1 20
-3900 0.160493827 18
-3958 0.238095238 7
-4058 0.2 5
-3500 0 25
-3500 0.086956522 28.75
-4117 0.141414141 6.6
-4350 0.171038825 31.76666667
-4350 0.166666667 6
-3650 0.143798024 30.36666667
-2715 0.137931034 7.25
-4350 0.235588972 26.6
-3500 0.228840125 79.75
-4350 0.041666667 8
-3500 0 5
-3650 0.174757282 20.6
-3800 0 9
-2715 0.377777778 11.25
-3500 0.2 7.5
-3650 0.078947368 7.6
-4117 0 8
-4350 0 8
-3400 0.208333333 24
-4233 0.184027778 19.2
-3025 0 7
-3650 0.285714286 12.6
-4350 0.166666667 6
代码
ppi <- 300
png("5+ KC shaded CI.png", width=6*ppi, height=6*ppi, res=ppi)
library(Hmisc)
Analysis5k <- read.csv(file.choose(), header = T)
Analysis5kz <- read.csv(file.choose(), header = T)
par(mfrow = c(1,1), pty = "s", oma=c(1,2,1,1), mar=c(4,4,2,2))
plot(X15p5 ~ Period, Analysis5kz, xaxt = "n", yaxt= "n", ylim=c(-0.2,0.7), xlim=c(-5000,-2500), xlab = "Years B.P.", ylab = expression(''[15]*'p'[5]), main = "")
vx <- seq(-5000,-2000, by = 500)
vy <- seq(-0.2,0.7, by = 0.1)
axis(1, at = vx)
axis(2, at = vy)
a5k <- order(Analysis5k$Period)
a5kz <- order(Analysis5kz$Period)
Analysis5k.lo <- loess(X15p5 ~ Period, Analysis5k, weights = Total_5plus, span = 0.6)
Analysis5kz.lo <- loess(X15p5 ~ Period, Analysis5kz, weights = Total_5plus, span = 0.6)
pred5k <- predict(Analysis5k.lo, se = TRUE)
pred5kz <- predict(Analysis5kz.lo, se = TRUE)
lines(Analysis5k$Period[a5k], pred5k$fit[a5k], col="blue", lwd=2)
lines(Analysis5kz$Period[a5kz], pred5kz$fit[a5kz], col="skyblue", lwd=2)
lines(Analysis5K$Period[a5K], pred5K$fit[a5K] - qt(0.975, pred5K$df)*pred5K$se[a5K],col="blue",lty=2)
lines(Analysis5K$Period[a5K], pred5K$fit[a5K] + qt(0.975, pred5K$df)*pred5K$se[a5K],col="blue",lty=2)
lines(Analysis5Kz$Period[a5Kz], pred5Kz$fit[a5Kz] - qt(0.975, pred5Kz$df)*pred5Kz$se[a5Kz],col="skyblue",lty=2)
lines(Analysis5Kz$Period[a5Kz], pred5Kz$fit[a5Kz] + qt(0.975, pred5Kz$df)*pred5Kz$se[a5Kz],col="skyblue",lty=2)
abline(h=0.173, lty=3)
abline(v=-4700, lty=3)
abline(v=-4000, lty=3)
abline(v=-3000, lty=3)
minor.tick(nx=5, ny=4, tick.ratio=0.5)
dev.off()
这是使用 ggplot 的一种方法:
(1) 对两者应用黄土平滑 data.sets
library(dplyr)
df.lo <- lapply(datlist, function(x)loess(X15p5 ~ Period, data=x, weights = Total_5plus, span = 0.6))
(2) 创建一个新的 data.frame 扩展 data.set 的最小 (-4350) 和最大周期 (-2715):
nd1 <- nd2 <- expand.grid(Period=seq(-4350, -2715, length=100))
(3) 预测每个 loess smoother 的 fit 和 se 并结合成一个 data.frame:
nd1[,c("fit", "se")] <- predict(df1.lo[[1]], newdata=nd1, se=T)[1:2]
nd1 <- nd1 %>% mutate(group="5k")
nd2[,c("fit", "se")] <- predict(df2.lo[[2]], newdata=nd1, se=T)[1:2]
nd2 <- nd2 %>% mutate(group="5kz")
ndata <- rbind(nd1, nd2)
(4) 有了预测数据,用ggplot2::geom_ribbon
表示重叠se:
library(ggplot2)
p <- ggplot(ndata, aes(Period, fit)) +
geom_line(aes(colour=group)) +
geom_ribbon(aes(ymin=fit-1.96*se, ymax=fit+1.96*se, fill=group), alpha=.2)
p
(5) 添加数据点和 abline:
dat <- do.call(rbind, datlist)
p +
geom_point(data=dat, aes(y=X15p5, shape=as.factor(group)), alpha=.2) +
geom_hline(yintercept=0.173, linetype="dotted") +
geom_vline(xintercept=c(-4700, -4000, -3000), linetype="dotted") +
ylab("X15p5") +
theme_bw()
源数据datlist
是data.frame"Analysis5k"和"Analysis5kz"两个的列表。输入如下:
structure(list(`5k` = structure(list(Period = c(-4350L, -3900L,
-3650L, -3900L, -3958L, -4350L, -3400L, -3650L, -3400L, -3500L,
-3600L, -3808L, -3900L, -3958L, -4058L, -3500L, -4117L, -4350L,
-4350L, -3650L, -2715L, -4350L, -3500L, -4350L, -3650L, -2715L,
-3500L, -3650L, -3400L, -4233L, -3650L, -4350L), X15p5 = c(0.100529101,
0.4, 0.0625, 0.126984127, 0.133333333, 0.150943396, 0.146341463,
0.255319149, 0.222222222, 0.245014245, 0.125, 0.1, 0.160493827,
0.238095238, 0.2, 0.086956522, 0.141414141, 0.171038825, 0.166666667,
0.143798024, 0.137931034, 0.235588972, 0.228840125, 0.041666667,
0.174757282, 0.377777778, 0.2, 0.078947368, 0.208333333, 0.184027778,
0.285714286, 0.166666667), Total_5plus = c(12.6, 20, 9.6, 16.8,
5, 10.6, 8.2, 9.4, 9, 39, 8, 20, 18, 7, 5, 28.75, 6.6, 31.76666667,
6, 30.36666667, 7.25, 26.6, 79.75, 8, 20.6, 11.25, 7.5, 7.6,
24, 19.2, 12.6, 6), group = c("5k", "5k", "5k", "5k", "5k", "5k",
"5k", "5k", "5k", "5k", "5k", "5k", "5k", "5k", "5k", "5k", "5k",
"5k", "5k", "5k", "5k", "5k", "5k", "5k", "5k", "5k", "5k", "5k",
"5k", "5k", "5k", "5k")), .Names = c("Period", "X15p5", "Total_5plus",
"group"), row.names = c(NA, 32L), class = "data.frame"), `5kz` =
structure(list(
Period = c(-4350L, -4350L, -3900L, -3650L, -3400L, -3900L,
-3958L, -4350L, -3400L, -3650L, -3400L, -3500L, -3600L, -3650L,
-3808L, -3900L, -3958L, -4058L, -3500L, -3500L, -4117L, -4350L,
-4350L, -3650L, -2715L, -4350L, -3500L, -4350L, -3500L, -3650L,
-3800L, -2715L, -3500L, -3650L, -4117L, -4350L, -3400L, -4233L,
-3025L, -3650L, -4350L), X15p5 = c(0.100529101, 0, 0.4, 0.0625,
0, 0.126984127, 0.133333333, 0.150943396, 0.146341463, 0.255319149,
0.222222222, 0.245014245, 0.125, 0, 0.1, 0.160493827, 0.238095238,
0.2, 0, 0.086956522, 0.141414141, 0.171038825, 0.166666667,
0.143798024, 0.137931034, 0.235588972, 0.228840125, 0.041666667,
0, 0.174757282, 0, 0.377777778, 0.2, 0.078947368, 0, 0, 0.208333333,
0.184027778, 0, 0.285714286, 0.166666667), Total_5plus = c(12.6,
5, 20, 9.6, 6, 16.8, 5, 10.6, 8.2, 9.4, 9, 39, 8, 28, 20,
18, 7, 5, 25, 28.75, 6.6, 31.76666667, 6, 30.36666667, 7.25,
26.6, 79.75, 8, 5, 20.6, 9, 11.25, 7.5, 7.6, 8, 8, 24, 19.2,
7, 12.6, 6), group = c("5kz", "5kz", "5kz", "5kz", "5kz",
"5kz", "5kz", "5kz", "5kz", "5kz", "5kz", "5kz", "5kz", "5kz",
"5kz", "5kz", "5kz", "5kz", "5kz", "5kz", "5kz", "5kz", "5kz",
"5kz", "5kz", "5kz", "5kz", "5kz", "5kz", "5kz", "5kz", "5kz",
"5kz", "5kz", "5kz", "5kz", "5kz", "5kz", "5kz", "5kz", "5kz"
)), .Names = c("Period", "X15p5", "Total_5plus", "group"), row.names = 33:73, class = "data.frame")), .Names = c("5k",
"5kz"))
我会提出一个 tidyverse 解决方案。在这种方法中,您首先创建一个函数来计算和提取所需的统计信息。然后创建一个列表列,其中包含 nest
、map
该列表中的函数和 unnest
结果。
您可以在 http://r4ds.had.co.nz/many-models.html 阅读有关此方法的更多信息。
library(tidyverse)
# create function to retrieve fit and se
pred_fun <- function(df) {
model <- loess(`15p5` ~ Period, df, weights = Total_5plus, span = .6)
preds <- predict(model, se = T)
data_frame(fit = preds[["fit"]],
se = preds[["se.fit"]])
}
# nest, map and unnest fits
nested <- bind_rows(df_5k, df_5kz) %>%
group_by(origin) %>%
nest() %>%
mutate(preds = map(data, pred_fun)) %>%
unnest(data, preds)
# plot result
ggplot(nested, aes(Period, `15p5`)) +
geom_ribbon(aes(ymin = fit - 1.96 * se, ymax = fit + 1.96 * se, fill = origin),
alpha = .2) +
geom_point() +
geom_line(aes(y = fit, colour = origin)) +
scale_y_continuous(expand = c(.3, 0)) +
scale_x_continuous(expand = c(.3, 0), breaks = scales::pretty_breaks(6)) +
theme_bw() +
theme(legend.position = "bottom") +
labs(x = "Years B.P.", y = expression(''[15]*'p'[5]))
当然你可以编辑组的颜色,例如像这样:
cols <- c(df_5k = "blue", df_5kz = "skyblue")
ggplot...
...
scale_fill_manual(values = cols) +
scale_color_manual(values = cols)
编辑:
因为我不知道如何用基础图形做你想做的事,我会尝试使绘图看起来像基础图形,使用 ggthemes::theme_base
并像这样更改点类型:
ggplot(nested, aes(Period, `15p5`)) +
ggthemes::theme_base() +
geom_hline(yintercept = 0.173, linetype = "dotted") +
geom_vline(xintercept = c(-4700, -4000, -3000), linetype = "dotted") +
geom_ribbon(aes(ymin = fit - 1.96 * se, ymax = fit + 1.96 * se, fill = origin),
alpha = .2) +
geom_point(shape = 1) +
geom_line(aes(y = fit, colour = origin)) +
scale_y_continuous(expand = c(.3, 0)) +
scale_x_continuous(expand = c(.3, 0), breaks = scales::pretty_breaks(6)) +
theme(legend.position = "bottom") +
labs(x = "Years B.P.", y = expression(''[15]*'p'[5]),
colour = NULL, fill = NULL)
数据导入
df_5k <- "Period 15p5 Total_5plus
-4350 0.100529101 12.6
-3900 0.4 20
-3650 0.0625 9.6
-3900 0.126984127 16.8
-3958 0.133333333 5
-4350 0.150943396 10.6
-3400 0.146341463 8.2
-3650 0.255319149 9.4
-3400 0.222222222 9
-3500 0.245014245 39
-3600 0.125 8
-3808 0.1 20
-3900 0.160493827 18
-3958 0.238095238 7
-4058 0.2 5
-3500 0.086956522 28.75
-4117 0.141414141 6.6
-4350 0.171038825 31.76666667
-4350 0.166666667 6
-3650 0.143798024 30.36666667
-2715 0.137931034 7.25
-4350 0.235588972 26.6
-3500 0.228840125 79.75
-4350 0.041666667 8
-3650 0.174757282 20.6
-2715 0.377777778 11.25
-3500 0.2 7.5
-3650 0.078947368 7.6
-3400 0.208333333 24
-4233 0.184027778 19.2
-3650 0.285714286 12.6
-4350 0.166666667 6"
df_5k <- read_table2(df_5k) %>%
mutate(origin = "df_5k")
df_5kz <- "Period 15p5 Total_5plus
-4350 0.100529101 12.6
-4350 0 5
-3900 0.4 20
-3650 0.0625 9.6
-3400 0 6
-3900 0.126984127 16.8
-3958 0.133333333 5
-4350 0.150943396 10.6
-3400 0.146341463 8.2
-3650 0.255319149 9.4
-3400 0.222222222 9
-3500 0.245014245 39
-3600 0.125 8
-3650 0 28
-3808 0.1 20
-3900 0.160493827 18
-3958 0.238095238 7
-4058 0.2 5
-3500 0 25
-3500 0.086956522 28.75
-4117 0.141414141 6.6
-4350 0.171038825 31.76666667
-4350 0.166666667 6
-3650 0.143798024 30.36666667
-2715 0.137931034 7.25
-4350 0.235588972 26.6
-3500 0.228840125 79.75
-4350 0.041666667 8
-3500 0 5
-3650 0.174757282 20.6
-3800 0 9
-2715 0.377777778 11.25
-3500 0.2 7.5
-3650 0.078947368 7.6
-4117 0 8
-4350 0 8
-3400 0.208333333 24
-4233 0.184027778 19.2
-3025 0 7
-3650 0.285714286 12.6
-4350 0.166666667 6"
df_5kz <- read_table2(df_5kz) %>%
mutate(origin = "df_5kz")
这是基于您的代码的底图解决方案。
polygon
的诀窍是您必须在一个向量中提供 2 次 x 坐标,一次按正常顺序,一次按相反顺序(使用函数 rev
),并且您必须提供y 坐标作为上限的向量,后跟下限的顺序相反。
我们使用 adjustcolor
函数使标准颜色透明。
library(Hmisc)
ppi <- 300
par(mfrow = c(1,1), pty = "s", oma=c(1,2,1,1), mar=c(4,4,2,2))
plot(X15p5 ~ Period, Analysis5kz, xaxt = "n", yaxt= "n", ylim=c(-0.2,0.7), xlim=c(-5000,-2500), xlab = "Years B.P.", ylab = expression(''[15]*'p'[5]), main = "")
vx <- seq(-5000,-2000, by = 500)
vy <- seq(-0.2,0.7, by = 0.1)
axis(1, at = vx)
axis(2, at = vy)
a5k <- order(Analysis5k$Period)
a5kz <- order(Analysis5kz$Period)
Analysis5k.lo <- loess(X15p5 ~ Period, Analysis5k, weights = Total_5plus, span = 0.6)
Analysis5kz.lo <- loess(X15p5 ~ Period, Analysis5kz, weights = Total_5plus, span = 0.6)
pred5k <- predict(Analysis5k.lo, se = TRUE)
pred5kz <- predict(Analysis5kz.lo, se = TRUE)
polygon(x = c(Analysis5k$Period[a5k], rev(Analysis5k$Period[a5k])),
y = c(pred5k$fit[a5k] - qt(0.975, pred5k$df)*pred5k$se[a5k],
rev(pred5k$fit[a5k] + qt(0.975, pred5k$df)*pred5k$se[a5k])),
col = adjustcolor("dodgerblue", alpha.f = 0.10), border = NA)
polygon(x = c(Analysis5kz$Period[a5kz], rev(Analysis5kz$Period[a5kz])),
y = c(pred5kz$fit[a5kz] - qt(0.975, pred5kz$df)*pred5kz$se[a5kz],
rev( pred5kz$fit[a5kz] + qt(0.975, pred5kz$df)*pred5kz$se[a5kz])),
col = adjustcolor("orangered", alpha.f = 0.10), border = NA)
lines(Analysis5k$Period[a5k], pred5k$fit[a5k], col="dodgerblue", lwd=2)
lines(Analysis5kz$Period[a5kz], pred5kz$fit[a5kz], col="orangered", lwd=2)
abline(h=0.173, lty=3)
abline(v=-4700, lty=3)
abline(v=-4000, lty=3)
abline(v=-3000, lty=3)
minor.tick(nx=5, ny=4, tick.ratio=0.5)
我正在比较使用 LOESS 回归的两条线。我想清楚地显示两条线的置信区间,但遇到了一些困难。
我尝试过使用各种线型和颜色,但在我看来,结果仍然太乱。我认为置信区间之间的阴影可能会让事情变得更清楚,但考虑到我的编码到目前为止的结构,我在解决这个问题时遇到了一些困难。我已经包含了生成的图、两组 Analysis5k 和 Analysis5kz 的数据,以及到目前为止我的代码。
我见过一些示例,其中两个多边形重叠以显示置信区间重叠的位置,这似乎是呈现数据的好方法。如果有一种方法可以在两个置信区间共享的区域中绘制多边形,那可能是呈现数据的另一种好方法。
我了解应该如何完成多边形的基本概念,但我发现的示例已应用于更简单的线条和数据。到目前为止,部分原因是我自己对一些糟糕的组织有过错,但由于这一步基本上是我数据展示的画龙点睛之笔,我真的不想从头开始重新做所有事情。
非常感谢任何帮助或见解。
更新
我更新了标题。我收到了一些使用 ggplot 的很好的例子,虽然我想在未来开始使用 ggplot,但到目前为止我只处理过 base R。对于这个特定的项目,如果可能的话,我想尽量将其保留在 base R 中。
分析5k
Period 15p5 Total_5plus
-4350 0.100529101 12.6
-3900 0.4 20
-3650 0.0625 9.6
-3900 0.126984127 16.8
-3958 0.133333333 5
-4350 0.150943396 10.6
-3400 0.146341463 8.2
-3650 0.255319149 9.4
-3400 0.222222222 9
-3500 0.245014245 39
-3600 0.125 8
-3808 0.1 20
-3900 0.160493827 18
-3958 0.238095238 7
-4058 0.2 5
-3500 0.086956522 28.75
-4117 0.141414141 6.6
-4350 0.171038825 31.76666667
-4350 0.166666667 6
-3650 0.143798024 30.36666667
-2715 0.137931034 7.25
-4350 0.235588972 26.6
-3500 0.228840125 79.75
-4350 0.041666667 8
-3650 0.174757282 20.6
-2715 0.377777778 11.25
-3500 0.2 7.5
-3650 0.078947368 7.6
-3400 0.208333333 24
-4233 0.184027778 19.2
-3650 0.285714286 12.6
-4350 0.166666667 6
分析5kz
Period 15p5 Total_5plus
-4350 0.100529101 12.6
-4350 0 5
-3900 0.4 20
-3650 0.0625 9.6
-3400 0 6
-3900 0.126984127 16.8
-3958 0.133333333 5
-4350 0.150943396 10.6
-3400 0.146341463 8.2
-3650 0.255319149 9.4
-3400 0.222222222 9
-3500 0.245014245 39
-3600 0.125 8
-3650 0 28
-3808 0.1 20
-3900 0.160493827 18
-3958 0.238095238 7
-4058 0.2 5
-3500 0 25
-3500 0.086956522 28.75
-4117 0.141414141 6.6
-4350 0.171038825 31.76666667
-4350 0.166666667 6
-3650 0.143798024 30.36666667
-2715 0.137931034 7.25
-4350 0.235588972 26.6
-3500 0.228840125 79.75
-4350 0.041666667 8
-3500 0 5
-3650 0.174757282 20.6
-3800 0 9
-2715 0.377777778 11.25
-3500 0.2 7.5
-3650 0.078947368 7.6
-4117 0 8
-4350 0 8
-3400 0.208333333 24
-4233 0.184027778 19.2
-3025 0 7
-3650 0.285714286 12.6
-4350 0.166666667 6
代码
ppi <- 300
png("5+ KC shaded CI.png", width=6*ppi, height=6*ppi, res=ppi)
library(Hmisc)
Analysis5k <- read.csv(file.choose(), header = T)
Analysis5kz <- read.csv(file.choose(), header = T)
par(mfrow = c(1,1), pty = "s", oma=c(1,2,1,1), mar=c(4,4,2,2))
plot(X15p5 ~ Period, Analysis5kz, xaxt = "n", yaxt= "n", ylim=c(-0.2,0.7), xlim=c(-5000,-2500), xlab = "Years B.P.", ylab = expression(''[15]*'p'[5]), main = "")
vx <- seq(-5000,-2000, by = 500)
vy <- seq(-0.2,0.7, by = 0.1)
axis(1, at = vx)
axis(2, at = vy)
a5k <- order(Analysis5k$Period)
a5kz <- order(Analysis5kz$Period)
Analysis5k.lo <- loess(X15p5 ~ Period, Analysis5k, weights = Total_5plus, span = 0.6)
Analysis5kz.lo <- loess(X15p5 ~ Period, Analysis5kz, weights = Total_5plus, span = 0.6)
pred5k <- predict(Analysis5k.lo, se = TRUE)
pred5kz <- predict(Analysis5kz.lo, se = TRUE)
lines(Analysis5k$Period[a5k], pred5k$fit[a5k], col="blue", lwd=2)
lines(Analysis5kz$Period[a5kz], pred5kz$fit[a5kz], col="skyblue", lwd=2)
lines(Analysis5K$Period[a5K], pred5K$fit[a5K] - qt(0.975, pred5K$df)*pred5K$se[a5K],col="blue",lty=2)
lines(Analysis5K$Period[a5K], pred5K$fit[a5K] + qt(0.975, pred5K$df)*pred5K$se[a5K],col="blue",lty=2)
lines(Analysis5Kz$Period[a5Kz], pred5Kz$fit[a5Kz] - qt(0.975, pred5Kz$df)*pred5Kz$se[a5Kz],col="skyblue",lty=2)
lines(Analysis5Kz$Period[a5Kz], pred5Kz$fit[a5Kz] + qt(0.975, pred5Kz$df)*pred5Kz$se[a5Kz],col="skyblue",lty=2)
abline(h=0.173, lty=3)
abline(v=-4700, lty=3)
abline(v=-4000, lty=3)
abline(v=-3000, lty=3)
minor.tick(nx=5, ny=4, tick.ratio=0.5)
dev.off()
这是使用 ggplot 的一种方法:
(1) 对两者应用黄土平滑 data.sets
library(dplyr)
df.lo <- lapply(datlist, function(x)loess(X15p5 ~ Period, data=x, weights = Total_5plus, span = 0.6))
(2) 创建一个新的 data.frame 扩展 data.set 的最小 (-4350) 和最大周期 (-2715):
nd1 <- nd2 <- expand.grid(Period=seq(-4350, -2715, length=100))
(3) 预测每个 loess smoother 的 fit 和 se 并结合成一个 data.frame:
nd1[,c("fit", "se")] <- predict(df1.lo[[1]], newdata=nd1, se=T)[1:2]
nd1 <- nd1 %>% mutate(group="5k")
nd2[,c("fit", "se")] <- predict(df2.lo[[2]], newdata=nd1, se=T)[1:2]
nd2 <- nd2 %>% mutate(group="5kz")
ndata <- rbind(nd1, nd2)
(4) 有了预测数据,用ggplot2::geom_ribbon
表示重叠se:
library(ggplot2)
p <- ggplot(ndata, aes(Period, fit)) +
geom_line(aes(colour=group)) +
geom_ribbon(aes(ymin=fit-1.96*se, ymax=fit+1.96*se, fill=group), alpha=.2)
p
(5) 添加数据点和 abline:
dat <- do.call(rbind, datlist)
p +
geom_point(data=dat, aes(y=X15p5, shape=as.factor(group)), alpha=.2) +
geom_hline(yintercept=0.173, linetype="dotted") +
geom_vline(xintercept=c(-4700, -4000, -3000), linetype="dotted") +
ylab("X15p5") +
theme_bw()
源数据datlist
是data.frame"Analysis5k"和"Analysis5kz"两个的列表。输入如下:
structure(list(`5k` = structure(list(Period = c(-4350L, -3900L,
-3650L, -3900L, -3958L, -4350L, -3400L, -3650L, -3400L, -3500L,
-3600L, -3808L, -3900L, -3958L, -4058L, -3500L, -4117L, -4350L,
-4350L, -3650L, -2715L, -4350L, -3500L, -4350L, -3650L, -2715L,
-3500L, -3650L, -3400L, -4233L, -3650L, -4350L), X15p5 = c(0.100529101,
0.4, 0.0625, 0.126984127, 0.133333333, 0.150943396, 0.146341463,
0.255319149, 0.222222222, 0.245014245, 0.125, 0.1, 0.160493827,
0.238095238, 0.2, 0.086956522, 0.141414141, 0.171038825, 0.166666667,
0.143798024, 0.137931034, 0.235588972, 0.228840125, 0.041666667,
0.174757282, 0.377777778, 0.2, 0.078947368, 0.208333333, 0.184027778,
0.285714286, 0.166666667), Total_5plus = c(12.6, 20, 9.6, 16.8,
5, 10.6, 8.2, 9.4, 9, 39, 8, 20, 18, 7, 5, 28.75, 6.6, 31.76666667,
6, 30.36666667, 7.25, 26.6, 79.75, 8, 20.6, 11.25, 7.5, 7.6,
24, 19.2, 12.6, 6), group = c("5k", "5k", "5k", "5k", "5k", "5k",
"5k", "5k", "5k", "5k", "5k", "5k", "5k", "5k", "5k", "5k", "5k",
"5k", "5k", "5k", "5k", "5k", "5k", "5k", "5k", "5k", "5k", "5k",
"5k", "5k", "5k", "5k")), .Names = c("Period", "X15p5", "Total_5plus",
"group"), row.names = c(NA, 32L), class = "data.frame"), `5kz` =
structure(list(
Period = c(-4350L, -4350L, -3900L, -3650L, -3400L, -3900L,
-3958L, -4350L, -3400L, -3650L, -3400L, -3500L, -3600L, -3650L,
-3808L, -3900L, -3958L, -4058L, -3500L, -3500L, -4117L, -4350L,
-4350L, -3650L, -2715L, -4350L, -3500L, -4350L, -3500L, -3650L,
-3800L, -2715L, -3500L, -3650L, -4117L, -4350L, -3400L, -4233L,
-3025L, -3650L, -4350L), X15p5 = c(0.100529101, 0, 0.4, 0.0625,
0, 0.126984127, 0.133333333, 0.150943396, 0.146341463, 0.255319149,
0.222222222, 0.245014245, 0.125, 0, 0.1, 0.160493827, 0.238095238,
0.2, 0, 0.086956522, 0.141414141, 0.171038825, 0.166666667,
0.143798024, 0.137931034, 0.235588972, 0.228840125, 0.041666667,
0, 0.174757282, 0, 0.377777778, 0.2, 0.078947368, 0, 0, 0.208333333,
0.184027778, 0, 0.285714286, 0.166666667), Total_5plus = c(12.6,
5, 20, 9.6, 6, 16.8, 5, 10.6, 8.2, 9.4, 9, 39, 8, 28, 20,
18, 7, 5, 25, 28.75, 6.6, 31.76666667, 6, 30.36666667, 7.25,
26.6, 79.75, 8, 5, 20.6, 9, 11.25, 7.5, 7.6, 8, 8, 24, 19.2,
7, 12.6, 6), group = c("5kz", "5kz", "5kz", "5kz", "5kz",
"5kz", "5kz", "5kz", "5kz", "5kz", "5kz", "5kz", "5kz", "5kz",
"5kz", "5kz", "5kz", "5kz", "5kz", "5kz", "5kz", "5kz", "5kz",
"5kz", "5kz", "5kz", "5kz", "5kz", "5kz", "5kz", "5kz", "5kz",
"5kz", "5kz", "5kz", "5kz", "5kz", "5kz", "5kz", "5kz", "5kz"
)), .Names = c("Period", "X15p5", "Total_5plus", "group"), row.names = 33:73, class = "data.frame")), .Names = c("5k",
"5kz"))
我会提出一个 tidyverse 解决方案。在这种方法中,您首先创建一个函数来计算和提取所需的统计信息。然后创建一个列表列,其中包含 nest
、map
该列表中的函数和 unnest
结果。
您可以在 http://r4ds.had.co.nz/many-models.html 阅读有关此方法的更多信息。
library(tidyverse)
# create function to retrieve fit and se
pred_fun <- function(df) {
model <- loess(`15p5` ~ Period, df, weights = Total_5plus, span = .6)
preds <- predict(model, se = T)
data_frame(fit = preds[["fit"]],
se = preds[["se.fit"]])
}
# nest, map and unnest fits
nested <- bind_rows(df_5k, df_5kz) %>%
group_by(origin) %>%
nest() %>%
mutate(preds = map(data, pred_fun)) %>%
unnest(data, preds)
# plot result
ggplot(nested, aes(Period, `15p5`)) +
geom_ribbon(aes(ymin = fit - 1.96 * se, ymax = fit + 1.96 * se, fill = origin),
alpha = .2) +
geom_point() +
geom_line(aes(y = fit, colour = origin)) +
scale_y_continuous(expand = c(.3, 0)) +
scale_x_continuous(expand = c(.3, 0), breaks = scales::pretty_breaks(6)) +
theme_bw() +
theme(legend.position = "bottom") +
labs(x = "Years B.P.", y = expression(''[15]*'p'[5]))
当然你可以编辑组的颜色,例如像这样:
cols <- c(df_5k = "blue", df_5kz = "skyblue")
ggplot...
...
scale_fill_manual(values = cols) +
scale_color_manual(values = cols)
编辑:
因为我不知道如何用基础图形做你想做的事,我会尝试使绘图看起来像基础图形,使用 ggthemes::theme_base
并像这样更改点类型:
ggplot(nested, aes(Period, `15p5`)) +
ggthemes::theme_base() +
geom_hline(yintercept = 0.173, linetype = "dotted") +
geom_vline(xintercept = c(-4700, -4000, -3000), linetype = "dotted") +
geom_ribbon(aes(ymin = fit - 1.96 * se, ymax = fit + 1.96 * se, fill = origin),
alpha = .2) +
geom_point(shape = 1) +
geom_line(aes(y = fit, colour = origin)) +
scale_y_continuous(expand = c(.3, 0)) +
scale_x_continuous(expand = c(.3, 0), breaks = scales::pretty_breaks(6)) +
theme(legend.position = "bottom") +
labs(x = "Years B.P.", y = expression(''[15]*'p'[5]),
colour = NULL, fill = NULL)
数据导入
df_5k <- "Period 15p5 Total_5plus
-4350 0.100529101 12.6
-3900 0.4 20
-3650 0.0625 9.6
-3900 0.126984127 16.8
-3958 0.133333333 5
-4350 0.150943396 10.6
-3400 0.146341463 8.2
-3650 0.255319149 9.4
-3400 0.222222222 9
-3500 0.245014245 39
-3600 0.125 8
-3808 0.1 20
-3900 0.160493827 18
-3958 0.238095238 7
-4058 0.2 5
-3500 0.086956522 28.75
-4117 0.141414141 6.6
-4350 0.171038825 31.76666667
-4350 0.166666667 6
-3650 0.143798024 30.36666667
-2715 0.137931034 7.25
-4350 0.235588972 26.6
-3500 0.228840125 79.75
-4350 0.041666667 8
-3650 0.174757282 20.6
-2715 0.377777778 11.25
-3500 0.2 7.5
-3650 0.078947368 7.6
-3400 0.208333333 24
-4233 0.184027778 19.2
-3650 0.285714286 12.6
-4350 0.166666667 6"
df_5k <- read_table2(df_5k) %>%
mutate(origin = "df_5k")
df_5kz <- "Period 15p5 Total_5plus
-4350 0.100529101 12.6
-4350 0 5
-3900 0.4 20
-3650 0.0625 9.6
-3400 0 6
-3900 0.126984127 16.8
-3958 0.133333333 5
-4350 0.150943396 10.6
-3400 0.146341463 8.2
-3650 0.255319149 9.4
-3400 0.222222222 9
-3500 0.245014245 39
-3600 0.125 8
-3650 0 28
-3808 0.1 20
-3900 0.160493827 18
-3958 0.238095238 7
-4058 0.2 5
-3500 0 25
-3500 0.086956522 28.75
-4117 0.141414141 6.6
-4350 0.171038825 31.76666667
-4350 0.166666667 6
-3650 0.143798024 30.36666667
-2715 0.137931034 7.25
-4350 0.235588972 26.6
-3500 0.228840125 79.75
-4350 0.041666667 8
-3500 0 5
-3650 0.174757282 20.6
-3800 0 9
-2715 0.377777778 11.25
-3500 0.2 7.5
-3650 0.078947368 7.6
-4117 0 8
-4350 0 8
-3400 0.208333333 24
-4233 0.184027778 19.2
-3025 0 7
-3650 0.285714286 12.6
-4350 0.166666667 6"
df_5kz <- read_table2(df_5kz) %>%
mutate(origin = "df_5kz")
这是基于您的代码的底图解决方案。
polygon
的诀窍是您必须在一个向量中提供 2 次 x 坐标,一次按正常顺序,一次按相反顺序(使用函数 rev
),并且您必须提供y 坐标作为上限的向量,后跟下限的顺序相反。
我们使用 adjustcolor
函数使标准颜色透明。
library(Hmisc)
ppi <- 300
par(mfrow = c(1,1), pty = "s", oma=c(1,2,1,1), mar=c(4,4,2,2))
plot(X15p5 ~ Period, Analysis5kz, xaxt = "n", yaxt= "n", ylim=c(-0.2,0.7), xlim=c(-5000,-2500), xlab = "Years B.P.", ylab = expression(''[15]*'p'[5]), main = "")
vx <- seq(-5000,-2000, by = 500)
vy <- seq(-0.2,0.7, by = 0.1)
axis(1, at = vx)
axis(2, at = vy)
a5k <- order(Analysis5k$Period)
a5kz <- order(Analysis5kz$Period)
Analysis5k.lo <- loess(X15p5 ~ Period, Analysis5k, weights = Total_5plus, span = 0.6)
Analysis5kz.lo <- loess(X15p5 ~ Period, Analysis5kz, weights = Total_5plus, span = 0.6)
pred5k <- predict(Analysis5k.lo, se = TRUE)
pred5kz <- predict(Analysis5kz.lo, se = TRUE)
polygon(x = c(Analysis5k$Period[a5k], rev(Analysis5k$Period[a5k])),
y = c(pred5k$fit[a5k] - qt(0.975, pred5k$df)*pred5k$se[a5k],
rev(pred5k$fit[a5k] + qt(0.975, pred5k$df)*pred5k$se[a5k])),
col = adjustcolor("dodgerblue", alpha.f = 0.10), border = NA)
polygon(x = c(Analysis5kz$Period[a5kz], rev(Analysis5kz$Period[a5kz])),
y = c(pred5kz$fit[a5kz] - qt(0.975, pred5kz$df)*pred5kz$se[a5kz],
rev( pred5kz$fit[a5kz] + qt(0.975, pred5kz$df)*pred5kz$se[a5kz])),
col = adjustcolor("orangered", alpha.f = 0.10), border = NA)
lines(Analysis5k$Period[a5k], pred5k$fit[a5k], col="dodgerblue", lwd=2)
lines(Analysis5kz$Period[a5kz], pred5kz$fit[a5kz], col="orangered", lwd=2)
abline(h=0.173, lty=3)
abline(v=-4700, lty=3)
abline(v=-4000, lty=3)
abline(v=-3000, lty=3)
minor.tick(nx=5, ny=4, tick.ratio=0.5)