ggplot 无法使用 facet_wrap 和组美学绘制平滑的 gam
ggplot fails to draw smooth gam using facet_wrap and group asthetic
我正在尝试使用具有群体美学的 ggplot 以及 facet_wrap
绘制多面板和多线图。但是,当一组数据点太少时,geom_smooth
对构面图中的所有行都失败。
plot1 <- ggplot(data=df1,
aes(x=Year, y=Mean, group=Group2, linetype=Group2, shape=Group2)) +
geom_errorbar(aes(ymin=Mean-SE, ymax=Mean+SE), width=0.2) +
geom_smooth(method = "gam", se=F, formula = y ~ s(x, k=3), size = 1, colour="black") +
geom_point(position=pd, size=2, fill="white") +
scale_x_continuous(limits=c(min(df1$Year-0.1), max(df1$Year+0.1)),
breaks=seq(min(df1$Year),max(df1$Year),5)) +
facet_wrap(~Group1, scales = "free", ncol=2) +
theme_bw() +
theme(axis.text.x = element_text(),
axis.title.x = element_blank(),
strip.background = element_blank(),
axis.line.x = element_line(colour="black"),
axis.line.y = element_line(colour="black"),
panel.grid.minor = element_blank(),
panel.grid.major = element_blank(),
panel.border = element_blank(),
panel.background = element_blank(),
legend.position="top",
legend.title = element_blank())
plot(plot1)
制作剧情如下。这只是为了使它更容易的摘要数据。就好像错误阻止了 ggplot 计算该特定方面的系列平滑。
数据
Year Group1 Group2 Mean SE
2011 Factor A Factor C 30.62089116 3.672624771
2011 Factor A Factor D 54.99066324 2.822405771
2011 Factor B Factor C 30.48859003 3.748388489
2011 Factor B Factor D 45.70410611 4.284244405
2017 Factor A Factor C 33.68256601 4.030964172
2017 Factor A Factor D 53.43496462 4.687042033
2017 Factor B Factor C 23.08799875 5.17753488
2001 Factor A Factor C 23.79166667 2.837795432
2001 Factor A Factor D 23.75925926 3.688185081
2001 Factor B Factor C 29.05555556 4.08597798
2001 Factor B Factor D 28.13333333 7.877429079
2008 Factor A Factor C 23.3 2.383624691
2008 Factor A Factor D 28.83333333 2.750959429
2008 Factor B Factor C 34.01666667 5.340999698
并绘制
很明显,有足够的数据可以为 factorB 组中的 factorC 画一条平滑线。任何的想法?
我认为这很棘手。在对 StatSmooth
进行了一些测试和阅读 current GH code 之后,我将我的发现总结如下:
观察
geom_smooth()
无法在绘图面板中绘制 all 平滑线,如果 any 数据组太少method = "gam"
AND formula = y ~ s(x, k = 3)
; 的观察结果
- 如果绘图分为多个面板,则只有包含违规数据组的面板受到影响;
formula = y ~ x
(即默认公式)不会发生这种情况;
- 使用默认公式的某些其他方法(例如
"lm"
、"glm"
)不会发生这种情况,但 会在 [=26= 时发生];
- 如果数据组只有 1 个观察值,则不会发生这种情况。
我们可以用一些简化的代码重现上面的内容:
# create sample data
n <- 30
set.seed(567)
df.1 <- data.frame( # there is only 1 observation for group == B
x = rnorm(n), y = rnorm(n),
group = c(rep("A", n - 1), rep("B", 1)),
facet = sample(c("X", "Y"), size = n, replace = TRUE))
set.seed(567)
df.2 <- data.frame( # there are 2 observations for group == B
x = rnorm(n), y = rnorm(n),
group = c(rep("A", n - 2), rep("B", 2)),
facet = sample(c("X", "Y"), size = n, replace = TRUE))
# create base plot
p <- ggplot(df.2, aes(x = x, y = y, color = group)) +
geom_point() + theme_bw()
# problem: no smoothed line at all in the entire plot
p + geom_smooth(method = "gam", formula = y ~ s(x, k = 3))
# problem: no smoothed line in the affected panel
p + facet_wrap(~ facet) +
geom_smooth(method = "gam", formula = y ~ s(x, k = 3))
# no problem with default formula: smoothed lines in both facet panels
p + facet_wrap(~ facet) + geom_smooth(method = "gam")
# no problem with lm / glm, but problem with loess
p + facet_wrap(~ facet) + geom_smooth(method = "lm")
p + facet_wrap(~ facet) + geom_smooth(method = "glm")
p + facet_wrap(~ facet) + geom_smooth(method = "loess")
# no problem if there's only one observation (instead of two)
p %+% df.1 + geom_smooth(method = "gam", formula = y ~ s(x, k = 3))
p %+% df.1 + facet_wrap(~ facet) +
geom_smooth(method = "gam", formula = y ~ s(x, k = 3))
观察 1 和 2 的解释:
我认为问题出在 StatSmooth
的 compute_group
函数中的最后两行。第一行为 aes(group = ...)
映射指定的每个组在数据帧上调用模型函数(例如 stats::glm
、stats::loess
、mgcv::gam
),而第二行调用其中一个stats::predict()
周围的包装器以获得模型的平滑值(和置信区间,如果适用)。
model <- do.call(method, c(base.args, method.args))
predictdf(model, xseq, se, level)
当参数 method = "gam", formula = y ~ s(x, k = 3)
用于只有 2 个观测值的数据帧时,会发生以下情况:
model <- do.call(mgcv::gam,
args = list(formula = y ~ s(x, k = 3),
data = df.2 %>% filter(group == "B" & facet == "X")))
Error in smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) :
A term has fewer unique covariate combinations than specified maximum
degrees of freedom
model
,定义为接受 do.call(...)
结果的对象甚至还没有被创建。最后一行代码 predictdf(...)
将抛出错误,因为 model
不存在。 没有分面,这会影响StatSmooth
完成的所有计算,并且geom_smooth()
接收不到可用于在其图层中创建任何geom的数据。 使用分面,上述计算是针对每个分面单独完成的,因此只有数据有问题的分面受到影响。
观察 3 和 4 的解释:
除此之外,如果我们不指定公式来替换默认的 y ~ x
,我们将从 gam
/ lm
/ glm
,可以将其传递给 ggplot2 的未导出 predictdf
函数,用于预测值的数据帧:
model <- do.call(mgcv::gam, # or stats::lm, stats::glm
args = list(formula = y ~ x,
data = df.2 %>% filter(group == "B" & facet == "X")))
result <- ggplot2:::predictdf(
model,
xseq = seq(-2, 1.5, length.out = 80), # pseudo range of x-axis values
se = FALSE, level = 0.95) # default SE / level parameters
loess
也会 return 一个有效的对象,尽管有很多警告。但是,将其传递给 predictdf
将导致错误:
model <- do.call(stats::loess,
args = list(formula = y ~ x,
data = df.2 %>% filter(group == "B" & facet == "X")))
result <- ggplot2:::predictdf(
model,
xseq = seq(-2, 1.5, length.out = 80), # pseudo range of x-axis values
se = FALSE, level = 0.95) # default SE / level parameters
Error in predLoess(object$y, object$x, newx = if (is.null(newdata))
object$x else if (is.data.frame(newdata))
as.matrix(model.frame(delete.response(terms(object)), : NA/NaN/Inf
in foreign function call (arg 5)
观察 5 的解释:
StatSmooth
的 compute_group
函数开头如下:
if (length(unique(data$x)) < 2) {
# Not enough data to perform fit
return(data.frame())
}
换句话说,如果指定组中只有 1 个观察值,StatSmooth
立即 return 一个空白数据框。因此,它永远不会到达代码的后续部分以抛出任何错误。
解决方法:
查明事情偏离轨道的地方后,我们可以对 compute_group
代码进行调整(参见注释和注释掉的部分):
new.compute_group <- function(
data, scales, method = "auto", formula = y~x, se = TRUE, n = 80, span = 0.75,
fullrange = FALSE, xseq = NULL, level = 0.95, method.args = list(), na.rm = FALSE) {
if (length(unique(data$x)) < 2) return(data.frame())
if (is.null(data$weight)) data$weight <- 1
if (is.null(xseq)) {
if (is.integer(data$x)) {
if (fullrange) {
xseq <- scales$x$dimension()
} else {
xseq <- sort(unique(data$x))
}
} else {
if (fullrange) {
range <- scales$x$dimension()
} else {
range <- range(data$x, na.rm = TRUE)
}
xseq <- seq(range[1], range[2], length.out = n)
}
}
if (identical(method, "loess")) method.args$span <- span
if (is.character(method)) method <- match.fun(method)
base.args <- list(quote(formula), data = quote(data), weights = quote(weight))
# if modelling fails, return empty data frame
# model <- do.call(method, c(base.args, method.args))
model <- try(do.call(method, c(base.args, method.args)))
if(inherits(model, "try-error")) return(data.frame())
# if modelling didn't fail, but prediction returns NA,
# also return empty data frame
# predictdf(model, xseq, se, level)
pred <- try(ggplot2:::predictdf(model, xseq, se, level))
if(inherits(pred, "try-error")) return(data.frame())
return(pred)
}
定义一个使用此版本的新统计层:
# same as stat_smooth() except that it uses stat = StatSmooth2, rather
# than StatSmooth
stat_smooth_local <- function(
mapping = NULL, data = NULL, geom = "smooth", position = "identity", ...,
method = "auto", formula = y ~ x, se = TRUE, n = 80, span = 0.75,
fullrange = FALSE, level = 0.95, method.args = list(), na.rm = FALSE,
show.legend = NA, inherit.aes = TRUE) {
layer(
data = data, mapping = mapping, stat = StatSmooth2,
geom = geom, position = position, show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
method = method, formula = formula, se = se, n = n,
fullrange = fullrange, level = level, na.rm = na.rm,
method.args = method.args, span = span, ...
)
)
}
# inherit from StatSmooth
StatSmooth2 <- ggproto(
"StatSmooth2", ggplot2::StatSmooth,
compute_group = new.compute_group
)
结果:
我们可以 运行 通过与之前相同的案例,将 geom_smooth()
替换为 stat_smooth_local()
,并验证平滑的 geom 层在每个案例中都是可见的(注意有些仍然会导致错误消息):
# problem resolved: smoothed line for applicable group in the entire plot
p + stat_smooth_local(method = "gam", formula = y ~ s(x, k = 3))
# problem resolved: smoothed line for applicable group in the affected panel
p + facet_wrap(~ facet) +
stat_smooth_local(method = "gam", formula = y ~ s(x, k = 3))
# still no problem with default formula
p + facet_wrap(~ facet) + stat_smooth_local(method = "gam")
# still no problem with lm / glm; problem resolved for loess
p + facet_wrap(~ facet) + stat_smooth_local(method = "lm")
p + facet_wrap(~ facet) + stat_smooth_local(method = "glm")
p + facet_grid(~ facet) + stat_smooth_local(method = "loess")
# still no problem if there's only one observation (instead of two)
p %+% df.1 + stat_smooth_local(method = "gam", formula = y ~ s(x, k = 3))
p %+% df.1 + facet_wrap(~ facet) +
stat_smooth_local(method = "gam", formula = y ~ s(x, k = 3))
# showing one pair of contrasts here
cowplot::plot_grid(
p + facet_wrap(~ facet) + ggtitle("Before") +
geom_smooth(method = "gam", formula = y ~ s(x, k = 3)),
p + facet_wrap(~ facet) + ggtitle("After") +
stat_smooth_local(method = "gam", formula = y ~ s(x, k = 3)),
nrow = 2
)
处理这个问题的一个非常简单的方法是在传递给 geom_smooth
:
的数据中将导致问题的行子集化
library(tidyverse)
df1 <- data_frame(
Year = c(2011L, 2011L, 2011L, 2011L, 2017L, 2017L, 2017L, 2001L, 2001L, 2001L, 2001L, 2008L, 2008L, 2008L),
Group1 = c("Factor A", "Factor A", "Factor B", "Factor B", "Factor A", "Factor A", "Factor B", "Factor A", "Factor A", "Factor B", "Factor B", "Factor A", "Factor A", "Factor B"),
Group2 = c("Factor C", "Factor D", "Factor C", "Factor D", "Factor C", "Factor D", "Factor C", "Factor C", "Factor D", "Factor C", "Factor D", "Factor C", "Factor D", "Factor C"),
Mean = c(30.62089116, 54.99066324, 30.48859003, 45.70410611, 33.68256601, 53.43496462, 23.08799875, 23.79166667, 23.75925926, 29.05555556, 28.13333333, 23.3, 28.83333333, 34.01666667),
SE = c(3.672624771, 2.822405771, 3.748388489, 4.284244405, 4.030964172, 4.687042033, 5.17753488, 2.837795432, 3.688185081, 4.08597798, 7.877429079, 2.383624691, 2.750959429, 5.340999698)
)
ggplot(df1, aes(Year, Mean, color = Group2)) +
geom_errorbar(aes(ymin = Mean - SE, ymax = Mean + SE)) +
geom_smooth(data = df1 %>% group_by(Group1, Group2) %>% filter(n() > 2), # subset
method = "gam", formula = y ~ s(x, k=3)) +
geom_point() +
facet_wrap(~Group1)
我正在尝试使用具有群体美学的 ggplot 以及 facet_wrap
绘制多面板和多线图。但是,当一组数据点太少时,geom_smooth
对构面图中的所有行都失败。
plot1 <- ggplot(data=df1,
aes(x=Year, y=Mean, group=Group2, linetype=Group2, shape=Group2)) +
geom_errorbar(aes(ymin=Mean-SE, ymax=Mean+SE), width=0.2) +
geom_smooth(method = "gam", se=F, formula = y ~ s(x, k=3), size = 1, colour="black") +
geom_point(position=pd, size=2, fill="white") +
scale_x_continuous(limits=c(min(df1$Year-0.1), max(df1$Year+0.1)),
breaks=seq(min(df1$Year),max(df1$Year),5)) +
facet_wrap(~Group1, scales = "free", ncol=2) +
theme_bw() +
theme(axis.text.x = element_text(),
axis.title.x = element_blank(),
strip.background = element_blank(),
axis.line.x = element_line(colour="black"),
axis.line.y = element_line(colour="black"),
panel.grid.minor = element_blank(),
panel.grid.major = element_blank(),
panel.border = element_blank(),
panel.background = element_blank(),
legend.position="top",
legend.title = element_blank())
plot(plot1)
制作剧情如下。这只是为了使它更容易的摘要数据。就好像错误阻止了 ggplot 计算该特定方面的系列平滑。
数据
Year Group1 Group2 Mean SE
2011 Factor A Factor C 30.62089116 3.672624771
2011 Factor A Factor D 54.99066324 2.822405771
2011 Factor B Factor C 30.48859003 3.748388489
2011 Factor B Factor D 45.70410611 4.284244405
2017 Factor A Factor C 33.68256601 4.030964172
2017 Factor A Factor D 53.43496462 4.687042033
2017 Factor B Factor C 23.08799875 5.17753488
2001 Factor A Factor C 23.79166667 2.837795432
2001 Factor A Factor D 23.75925926 3.688185081
2001 Factor B Factor C 29.05555556 4.08597798
2001 Factor B Factor D 28.13333333 7.877429079
2008 Factor A Factor C 23.3 2.383624691
2008 Factor A Factor D 28.83333333 2.750959429
2008 Factor B Factor C 34.01666667 5.340999698
并绘制
很明显,有足够的数据可以为 factorB 组中的 factorC 画一条平滑线。任何的想法?
我认为这很棘手。在对 StatSmooth
进行了一些测试和阅读 current GH code 之后,我将我的发现总结如下:
观察
geom_smooth()
无法在绘图面板中绘制 all 平滑线,如果 any 数据组太少method = "gam"
ANDformula = y ~ s(x, k = 3)
; 的观察结果
- 如果绘图分为多个面板,则只有包含违规数据组的面板受到影响;
formula = y ~ x
(即默认公式)不会发生这种情况;- 使用默认公式的某些其他方法(例如
"lm"
、"glm"
)不会发生这种情况,但 会在 [=26= 时发生]; - 如果数据组只有 1 个观察值,则不会发生这种情况。
我们可以用一些简化的代码重现上面的内容:
# create sample data
n <- 30
set.seed(567)
df.1 <- data.frame( # there is only 1 observation for group == B
x = rnorm(n), y = rnorm(n),
group = c(rep("A", n - 1), rep("B", 1)),
facet = sample(c("X", "Y"), size = n, replace = TRUE))
set.seed(567)
df.2 <- data.frame( # there are 2 observations for group == B
x = rnorm(n), y = rnorm(n),
group = c(rep("A", n - 2), rep("B", 2)),
facet = sample(c("X", "Y"), size = n, replace = TRUE))
# create base plot
p <- ggplot(df.2, aes(x = x, y = y, color = group)) +
geom_point() + theme_bw()
# problem: no smoothed line at all in the entire plot
p + geom_smooth(method = "gam", formula = y ~ s(x, k = 3))
# problem: no smoothed line in the affected panel
p + facet_wrap(~ facet) +
geom_smooth(method = "gam", formula = y ~ s(x, k = 3))
# no problem with default formula: smoothed lines in both facet panels
p + facet_wrap(~ facet) + geom_smooth(method = "gam")
# no problem with lm / glm, but problem with loess
p + facet_wrap(~ facet) + geom_smooth(method = "lm")
p + facet_wrap(~ facet) + geom_smooth(method = "glm")
p + facet_wrap(~ facet) + geom_smooth(method = "loess")
# no problem if there's only one observation (instead of two)
p %+% df.1 + geom_smooth(method = "gam", formula = y ~ s(x, k = 3))
p %+% df.1 + facet_wrap(~ facet) +
geom_smooth(method = "gam", formula = y ~ s(x, k = 3))
观察 1 和 2 的解释:
我认为问题出在 StatSmooth
的 compute_group
函数中的最后两行。第一行为 aes(group = ...)
映射指定的每个组在数据帧上调用模型函数(例如 stats::glm
、stats::loess
、mgcv::gam
),而第二行调用其中一个stats::predict()
周围的包装器以获得模型的平滑值(和置信区间,如果适用)。
model <- do.call(method, c(base.args, method.args))
predictdf(model, xseq, se, level)
当参数 method = "gam", formula = y ~ s(x, k = 3)
用于只有 2 个观测值的数据帧时,会发生以下情况:
model <- do.call(mgcv::gam,
args = list(formula = y ~ s(x, k = 3),
data = df.2 %>% filter(group == "B" & facet == "X")))
Error in smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) : A term has fewer unique covariate combinations than specified maximum degrees of freedom
model
,定义为接受 do.call(...)
结果的对象甚至还没有被创建。最后一行代码 predictdf(...)
将抛出错误,因为 model
不存在。 没有分面,这会影响StatSmooth
完成的所有计算,并且geom_smooth()
接收不到可用于在其图层中创建任何geom的数据。 使用分面,上述计算是针对每个分面单独完成的,因此只有数据有问题的分面受到影响。
观察 3 和 4 的解释:
除此之外,如果我们不指定公式来替换默认的 y ~ x
,我们将从 gam
/ lm
/ glm
,可以将其传递给 ggplot2 的未导出 predictdf
函数,用于预测值的数据帧:
model <- do.call(mgcv::gam, # or stats::lm, stats::glm
args = list(formula = y ~ x,
data = df.2 %>% filter(group == "B" & facet == "X")))
result <- ggplot2:::predictdf(
model,
xseq = seq(-2, 1.5, length.out = 80), # pseudo range of x-axis values
se = FALSE, level = 0.95) # default SE / level parameters
loess
也会 return 一个有效的对象,尽管有很多警告。但是,将其传递给 predictdf
将导致错误:
model <- do.call(stats::loess,
args = list(formula = y ~ x,
data = df.2 %>% filter(group == "B" & facet == "X")))
result <- ggplot2:::predictdf(
model,
xseq = seq(-2, 1.5, length.out = 80), # pseudo range of x-axis values
se = FALSE, level = 0.95) # default SE / level parameters
Error in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x else if (is.data.frame(newdata)) as.matrix(model.frame(delete.response(terms(object)), : NA/NaN/Inf in foreign function call (arg 5)
观察 5 的解释:
StatSmooth
的 compute_group
函数开头如下:
if (length(unique(data$x)) < 2) {
# Not enough data to perform fit
return(data.frame())
}
换句话说,如果指定组中只有 1 个观察值,StatSmooth
立即 return 一个空白数据框。因此,它永远不会到达代码的后续部分以抛出任何错误。
解决方法:
查明事情偏离轨道的地方后,我们可以对 compute_group
代码进行调整(参见注释和注释掉的部分):
new.compute_group <- function(
data, scales, method = "auto", formula = y~x, se = TRUE, n = 80, span = 0.75,
fullrange = FALSE, xseq = NULL, level = 0.95, method.args = list(), na.rm = FALSE) {
if (length(unique(data$x)) < 2) return(data.frame())
if (is.null(data$weight)) data$weight <- 1
if (is.null(xseq)) {
if (is.integer(data$x)) {
if (fullrange) {
xseq <- scales$x$dimension()
} else {
xseq <- sort(unique(data$x))
}
} else {
if (fullrange) {
range <- scales$x$dimension()
} else {
range <- range(data$x, na.rm = TRUE)
}
xseq <- seq(range[1], range[2], length.out = n)
}
}
if (identical(method, "loess")) method.args$span <- span
if (is.character(method)) method <- match.fun(method)
base.args <- list(quote(formula), data = quote(data), weights = quote(weight))
# if modelling fails, return empty data frame
# model <- do.call(method, c(base.args, method.args))
model <- try(do.call(method, c(base.args, method.args)))
if(inherits(model, "try-error")) return(data.frame())
# if modelling didn't fail, but prediction returns NA,
# also return empty data frame
# predictdf(model, xseq, se, level)
pred <- try(ggplot2:::predictdf(model, xseq, se, level))
if(inherits(pred, "try-error")) return(data.frame())
return(pred)
}
定义一个使用此版本的新统计层:
# same as stat_smooth() except that it uses stat = StatSmooth2, rather
# than StatSmooth
stat_smooth_local <- function(
mapping = NULL, data = NULL, geom = "smooth", position = "identity", ...,
method = "auto", formula = y ~ x, se = TRUE, n = 80, span = 0.75,
fullrange = FALSE, level = 0.95, method.args = list(), na.rm = FALSE,
show.legend = NA, inherit.aes = TRUE) {
layer(
data = data, mapping = mapping, stat = StatSmooth2,
geom = geom, position = position, show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
method = method, formula = formula, se = se, n = n,
fullrange = fullrange, level = level, na.rm = na.rm,
method.args = method.args, span = span, ...
)
)
}
# inherit from StatSmooth
StatSmooth2 <- ggproto(
"StatSmooth2", ggplot2::StatSmooth,
compute_group = new.compute_group
)
结果:
我们可以 运行 通过与之前相同的案例,将 geom_smooth()
替换为 stat_smooth_local()
,并验证平滑的 geom 层在每个案例中都是可见的(注意有些仍然会导致错误消息):
# problem resolved: smoothed line for applicable group in the entire plot
p + stat_smooth_local(method = "gam", formula = y ~ s(x, k = 3))
# problem resolved: smoothed line for applicable group in the affected panel
p + facet_wrap(~ facet) +
stat_smooth_local(method = "gam", formula = y ~ s(x, k = 3))
# still no problem with default formula
p + facet_wrap(~ facet) + stat_smooth_local(method = "gam")
# still no problem with lm / glm; problem resolved for loess
p + facet_wrap(~ facet) + stat_smooth_local(method = "lm")
p + facet_wrap(~ facet) + stat_smooth_local(method = "glm")
p + facet_grid(~ facet) + stat_smooth_local(method = "loess")
# still no problem if there's only one observation (instead of two)
p %+% df.1 + stat_smooth_local(method = "gam", formula = y ~ s(x, k = 3))
p %+% df.1 + facet_wrap(~ facet) +
stat_smooth_local(method = "gam", formula = y ~ s(x, k = 3))
# showing one pair of contrasts here
cowplot::plot_grid(
p + facet_wrap(~ facet) + ggtitle("Before") +
geom_smooth(method = "gam", formula = y ~ s(x, k = 3)),
p + facet_wrap(~ facet) + ggtitle("After") +
stat_smooth_local(method = "gam", formula = y ~ s(x, k = 3)),
nrow = 2
)
处理这个问题的一个非常简单的方法是在传递给 geom_smooth
:
library(tidyverse)
df1 <- data_frame(
Year = c(2011L, 2011L, 2011L, 2011L, 2017L, 2017L, 2017L, 2001L, 2001L, 2001L, 2001L, 2008L, 2008L, 2008L),
Group1 = c("Factor A", "Factor A", "Factor B", "Factor B", "Factor A", "Factor A", "Factor B", "Factor A", "Factor A", "Factor B", "Factor B", "Factor A", "Factor A", "Factor B"),
Group2 = c("Factor C", "Factor D", "Factor C", "Factor D", "Factor C", "Factor D", "Factor C", "Factor C", "Factor D", "Factor C", "Factor D", "Factor C", "Factor D", "Factor C"),
Mean = c(30.62089116, 54.99066324, 30.48859003, 45.70410611, 33.68256601, 53.43496462, 23.08799875, 23.79166667, 23.75925926, 29.05555556, 28.13333333, 23.3, 28.83333333, 34.01666667),
SE = c(3.672624771, 2.822405771, 3.748388489, 4.284244405, 4.030964172, 4.687042033, 5.17753488, 2.837795432, 3.688185081, 4.08597798, 7.877429079, 2.383624691, 2.750959429, 5.340999698)
)
ggplot(df1, aes(Year, Mean, color = Group2)) +
geom_errorbar(aes(ymin = Mean - SE, ymax = Mean + SE)) +
geom_smooth(data = df1 %>% group_by(Group1, Group2) %>% filter(n() > 2), # subset
method = "gam", formula = y ~ s(x, k=3)) +
geom_point() +
facet_wrap(~Group1)