R GAM 通过组参数产生不同的结果
R GAM producing different results via group argument
我有一些全年 24 小时的每小时数据,分为 7 个组。当我拟合一个 gam (mgcv::gam
) 时,我使用 by=
参数来生成 7 条不同的拟合线 - 它会产生一些看起来很奇怪的拟合。但是,当我将数据子集化为这些组中的一个并再次 运行 gam 时,没有使用 by=Group
参数,拟合看起来好多了并且有意义。
这是一个玩具示例,其中两种方法之间的变化并不那么显着,但使用 by=
参数时我的真实结果要显着得多,为什么会这样?
require(data.table)
require(mgcv)
require(ggplot2)
## create two groups of data, A & B
dtA <- data.table(t = rep(1:12,each=100) , N = c(runif(200, 0.0, 1.0),runif(200, 2.0, 3.0),runif(200, 5.0, 7.0),runif(200, 4.0, 5.0),runif(200, 1.0, 2.0),runif(200, 0.0, 1.0)), Group="A")
dtB <- data.table(t = rep(1:12,each=100) , N = c(runif(200, 20.0, 22.0),runif(200, 14.0, 16.0),runif(200, 6.0, 7.0),runif(200, 5.0, 6.0),runif(200, 12.0, 15.0),runif(200, 17.0, 20.0)), Group="B")
## put the data together, set the group as a factor
dt_gp <- rbindlist(list(dtA,dtB), use.names = T)
dt_gp[, Group := factor(Group, levels=c("A","B"))]
## create the gam , using the by grouping, and then fit to a blank table
gam1 <- gam(N ~ s(t,k=8, bs="cc", by=Group), data = dt_gp)
dt_fit1 <- data.table(t=rep(c(1:12),2), Group=rep(c("A","B"), each=12))
dt_fit1[, Group := factor(Group, levels=c("A","B"))]
fits1 = predict(gam1, newdata=dt_fit1, type='response', se=T)
predicts1 = as.data.table(data.frame(dt_fit1, fits1))
## now subset GpA data and run and recreate GAM and fitted line.
dt <- dt_gp[Group=="A"]
dt[,Group:=NULL]
gam2 <- gam(N ~ s(t,k=8, bs="cc"), data = dt)
dt_fit2 <- data.table(t=1:12)
fits2 = predict(gam2, newdata=dt_fit2, type='response', se=T)
predicts2 = as.data.table(data.frame(dt_fit2, fits2))
## plot to see difference (add Group to 2nd prediction for facet in plot)
predicts2[,Group:="A"]
ggplot()+
geom_line(data=predicts1, aes(x=t, y=fit), colour="blue")+
geom_line(data=predicts2, aes(x=t, y=fit), colour="red")+
geom_point(data=dt_gp, aes(x=t,y=N), colour="grey50")+
facet_wrap(~Group, nrow=2, scales="free_y")+
ggtitle("GAM on numbers grouped by A & B (numbers in A identical in both cases)")+
theme_bw()+
theme(axis.text.x = element_text(size=12),
axis.text.y = element_text(size=12),
axis.title = element_text(size=16),
legend.title=element_blank())
红线是我分出来的数据,蓝线是分组的。 mgcv::gam()
中的分组功能不是把数据分开了吗?随着我制作 A 和 B 的次数越多 'different',蓝线与原始数据点的吻合度越差。
来自 mgcv
中 s
函数的文档:
In the factor by
variable case a replicate of the smooth is produced
for each factor level (these smooths will be centered, so the factor
usually needs to be added as a main effect as well). See gam.models
for further details.
所以看起来您还想在调用 s
之外的公式中包含 Group
,例如,
gam1 <- gam(N ~ Group + s(t,k=8, bs="cc", by=Group), data = dt_gp)
.
我有一些全年 24 小时的每小时数据,分为 7 个组。当我拟合一个 gam (mgcv::gam
) 时,我使用 by=
参数来生成 7 条不同的拟合线 - 它会产生一些看起来很奇怪的拟合。但是,当我将数据子集化为这些组中的一个并再次 运行 gam 时,没有使用 by=Group
参数,拟合看起来好多了并且有意义。
这是一个玩具示例,其中两种方法之间的变化并不那么显着,但使用 by=
参数时我的真实结果要显着得多,为什么会这样?
require(data.table)
require(mgcv)
require(ggplot2)
## create two groups of data, A & B
dtA <- data.table(t = rep(1:12,each=100) , N = c(runif(200, 0.0, 1.0),runif(200, 2.0, 3.0),runif(200, 5.0, 7.0),runif(200, 4.0, 5.0),runif(200, 1.0, 2.0),runif(200, 0.0, 1.0)), Group="A")
dtB <- data.table(t = rep(1:12,each=100) , N = c(runif(200, 20.0, 22.0),runif(200, 14.0, 16.0),runif(200, 6.0, 7.0),runif(200, 5.0, 6.0),runif(200, 12.0, 15.0),runif(200, 17.0, 20.0)), Group="B")
## put the data together, set the group as a factor
dt_gp <- rbindlist(list(dtA,dtB), use.names = T)
dt_gp[, Group := factor(Group, levels=c("A","B"))]
## create the gam , using the by grouping, and then fit to a blank table
gam1 <- gam(N ~ s(t,k=8, bs="cc", by=Group), data = dt_gp)
dt_fit1 <- data.table(t=rep(c(1:12),2), Group=rep(c("A","B"), each=12))
dt_fit1[, Group := factor(Group, levels=c("A","B"))]
fits1 = predict(gam1, newdata=dt_fit1, type='response', se=T)
predicts1 = as.data.table(data.frame(dt_fit1, fits1))
## now subset GpA data and run and recreate GAM and fitted line.
dt <- dt_gp[Group=="A"]
dt[,Group:=NULL]
gam2 <- gam(N ~ s(t,k=8, bs="cc"), data = dt)
dt_fit2 <- data.table(t=1:12)
fits2 = predict(gam2, newdata=dt_fit2, type='response', se=T)
predicts2 = as.data.table(data.frame(dt_fit2, fits2))
## plot to see difference (add Group to 2nd prediction for facet in plot)
predicts2[,Group:="A"]
ggplot()+
geom_line(data=predicts1, aes(x=t, y=fit), colour="blue")+
geom_line(data=predicts2, aes(x=t, y=fit), colour="red")+
geom_point(data=dt_gp, aes(x=t,y=N), colour="grey50")+
facet_wrap(~Group, nrow=2, scales="free_y")+
ggtitle("GAM on numbers grouped by A & B (numbers in A identical in both cases)")+
theme_bw()+
theme(axis.text.x = element_text(size=12),
axis.text.y = element_text(size=12),
axis.title = element_text(size=16),
legend.title=element_blank())
红线是我分出来的数据,蓝线是分组的。 mgcv::gam()
中的分组功能不是把数据分开了吗?随着我制作 A 和 B 的次数越多 'different',蓝线与原始数据点的吻合度越差。
来自 mgcv
中 s
函数的文档:
In the factor
by
variable case a replicate of the smooth is produced for each factor level (these smooths will be centered, so the factor usually needs to be added as a main effect as well). Seegam.models
for further details.
所以看起来您还想在调用 s
之外的公式中包含 Group
,例如,
gam1 <- gam(N ~ Group + s(t,k=8, bs="cc", by=Group), data = dt_gp)
.