mgcv GAM:“by”参数中有多个变量(平滑变化超过 1 个因子)
mgcv GAM: more than one variable in `by` argument (smooth varying by more than 1 factor)
我需要对不止一个因素的平滑项建模。 by
参数允许我为每个因子水平建模一个平滑模型,但我找不到如何在多个因子上做到这一点。
我尝试了类似于以下的解决方案,但没有成功:
data <- iris
data$factor2 <- rep(c("A", "B"), 75)
mgcv::gam(Sepal.Length ~ s(Petal.Length, by = c(Species, factor2)), data = data)
#> Error in model.frame.default(formula = Sepal.Length ~ 1 + Petal.Length + : variable lengths differ (found for 'c(Species, factor2)')
由 reprex package (v2.0.0)
于 2021-08-05 创建
欢迎任何帮助!
您可以使用 interaction
函数创建一个具有 6 个级别的交互变量:
data$Sp.by.fac2 <- with(data, interaction(Species,factor2,
drop=TRUE)) # 'drop' addresses Gavin's concern
library(mgcv)
gam(Sepal.Length ~ s(Petal.Length, by = Sp.by.fac2), data = data)
Family: gaussian
Link function: identity
Formula:
Sepal.Length ~ s(Petal.Length, by = Sp.by.fac2)
Estimated degrees of freedom:
1.42 3.04 1.00 1.82 2.28 1.00 total = 11.55
GCV score: 0.1209707
我在使用 rms+Hmisc
包组合构建有意义的二维样条曲面方面也取得了成功。那些用格子图绘制得很好。
You can make an interaction variable
手动创建交互变量的问题是它使后处理(解释和绘图)过程复杂化。
我尝试在公式中指定两个单独的平滑项,returns 不同的平滑项针对不同的级别组合。但它似乎(毫不奇怪)它没有考虑相互作用,即它计算了 by=Species
和 by=factor2
.
的“主效应”
set.seed(1)
data <- iris
data$factor2 <- as.factor(sample(c("A", "A", "B"), 150, replace = T))
data$Petal.Length <- sqrt(data$Petal.Length)
m <- mgcv::gam(Sepal.Length ~ s(Petal.Length, by = Species) + s(Petal.Length, by = factor2), data = data)
plot(modelbased::estimate_relation(m, length = 100, preserve_range = FALSE))
由 reprex package (v2.0.0)
于 2021-08-06 创建
我确实尝试过使用张量积,但它似乎不适合以下因素:
m <- mgcv::gam(Sepal.Length ~ t2(Petal.Length, factor2, by = Species), data = data)
#> Error in smooth.construct.cr.smooth.spec(object$margin[[i]], dat, knt): factor2 has insufficient unique values to support 5 knots: reduce k.
由 reprex package (v2.0.0)
于 2021-08-06 创建
interaction()
造成的问题之一是它改变了模型的矩阵,这意味着模型数据中包含的一些变量发生了变化:
m <- mgcv::gam(body_mass_g ~ s(flipper_length_mm, by = interaction(species, sex)), data = palmerpenguins::penguins)
head(insight::get_data(m))
#> body_mass_g flipper_length_mm species sex
#> 1 3750 181 Adelie.male male
#> 2 3800 186 Adelie.female female
#> 3 3250 195 Adelie.female female
#> 5 3450 193 Adelie.female female
#> 6 3650 190 Adelie.male male
#> 7 3625 181 Adelie.female female
由 reprex package (v2.0.1)
于 2021-08-06 创建
这可能会导致使用后处理函数时出现一些问题,例如用于可视化。
但是,根据 Gavin 和 IRTFM 的回答,可以通过将变量作为固定效应添加到模型中轻松解决这个问题。
这是一个演示,还说明了两个单独的平滑和交互之间的差异:
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.0.5
set.seed(1)
# Create data
data <- data.frame(x = rep(seq(-10, 10, length.out = 500), 2),
fac1 = as.factor(rep(c("A", "B", "C"), length.out = 1000)),
fac2 = as.factor(rep(c("X", "Y"), each = 500)))
data$y <- data$x^2 + rnorm(nrow(data), sd = 5)
data$y[data$fac1 == "A"] <- sign(data$x[data$fac1 == "A"]) * data$y[data$fac1 == "A"] + 50
data$y[data$fac1 == "B"] <- datawizard::change_scale(data$y[data$fac1 == "B"]^3, c(-50, 100))
data$y[data$fac2 == "X" & data$fac1 == "C"] <- data$y[data$fac2 == "X" & data$fac1 == "C"] - 100
data$y[data$fac2 == "X" & data$fac1 == "B"] <- datawizard::change_scale(data$y[data$fac2 == "X" & data$fac1 == "B"] ^ 2, c(-50, 100))
data$y[data$fac2 == "X" & data$fac1 == "A"] <- datawizard::change_scale(data$y[data$fac2 == "X" & data$fac1 == "A"] * -3, c(0, 100))
# Real trends
ggplot(data, aes(x = x, y = y, color = fac1, shape = fac2)) +
geom_point()
# Two smooths
m <- mgcv::gam(y ~ fac1 * fac2 + s(x, by = fac1) + s(x, by = fac2), data = data)
plot(modelbased::estimate_relation(m, length = 100, preserve_range = F))
# Interaction
m <- mgcv::gam(y ~ fac1 * fac2 + s(x, by = interaction(fac1, fac2)), data = data)
plot(modelbased::estimate_relation(m, length = 100, preserve_range = F))
由 reprex package (v2.0.1)
于 2021-08-06 创建
最后一个模型设法恢复了每个因素组合的趋势。
我需要对不止一个因素的平滑项建模。 by
参数允许我为每个因子水平建模一个平滑模型,但我找不到如何在多个因子上做到这一点。
我尝试了类似于以下的解决方案,但没有成功:
data <- iris
data$factor2 <- rep(c("A", "B"), 75)
mgcv::gam(Sepal.Length ~ s(Petal.Length, by = c(Species, factor2)), data = data)
#> Error in model.frame.default(formula = Sepal.Length ~ 1 + Petal.Length + : variable lengths differ (found for 'c(Species, factor2)')
由 reprex package (v2.0.0)
于 2021-08-05 创建欢迎任何帮助!
您可以使用 interaction
函数创建一个具有 6 个级别的交互变量:
data$Sp.by.fac2 <- with(data, interaction(Species,factor2,
drop=TRUE)) # 'drop' addresses Gavin's concern
library(mgcv)
gam(Sepal.Length ~ s(Petal.Length, by = Sp.by.fac2), data = data)
Family: gaussian
Link function: identity
Formula:
Sepal.Length ~ s(Petal.Length, by = Sp.by.fac2)
Estimated degrees of freedom:
1.42 3.04 1.00 1.82 2.28 1.00 total = 11.55
GCV score: 0.1209707
我在使用 rms+Hmisc
包组合构建有意义的二维样条曲面方面也取得了成功。那些用格子图绘制得很好。
You can make an interaction variable
手动创建交互变量的问题是它使后处理(解释和绘图)过程复杂化。
我尝试在公式中指定两个单独的平滑项,returns 不同的平滑项针对不同的级别组合。但它似乎(毫不奇怪)它没有考虑相互作用,即它计算了 by=Species
和 by=factor2
.
set.seed(1)
data <- iris
data$factor2 <- as.factor(sample(c("A", "A", "B"), 150, replace = T))
data$Petal.Length <- sqrt(data$Petal.Length)
m <- mgcv::gam(Sepal.Length ~ s(Petal.Length, by = Species) + s(Petal.Length, by = factor2), data = data)
plot(modelbased::estimate_relation(m, length = 100, preserve_range = FALSE))
由 reprex package (v2.0.0)
于 2021-08-06 创建我确实尝试过使用张量积,但它似乎不适合以下因素:
m <- mgcv::gam(Sepal.Length ~ t2(Petal.Length, factor2, by = Species), data = data)
#> Error in smooth.construct.cr.smooth.spec(object$margin[[i]], dat, knt): factor2 has insufficient unique values to support 5 knots: reduce k.
由 reprex package (v2.0.0)
于 2021-08-06 创建interaction()
造成的问题之一是它改变了模型的矩阵,这意味着模型数据中包含的一些变量发生了变化:
m <- mgcv::gam(body_mass_g ~ s(flipper_length_mm, by = interaction(species, sex)), data = palmerpenguins::penguins)
head(insight::get_data(m))
#> body_mass_g flipper_length_mm species sex
#> 1 3750 181 Adelie.male male
#> 2 3800 186 Adelie.female female
#> 3 3250 195 Adelie.female female
#> 5 3450 193 Adelie.female female
#> 6 3650 190 Adelie.male male
#> 7 3625 181 Adelie.female female
由 reprex package (v2.0.1)
于 2021-08-06 创建这可能会导致使用后处理函数时出现一些问题,例如用于可视化。
但是,根据 Gavin 和 IRTFM 的回答,可以通过将变量作为固定效应添加到模型中轻松解决这个问题。
这是一个演示,还说明了两个单独的平滑和交互之间的差异:
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.0.5
set.seed(1)
# Create data
data <- data.frame(x = rep(seq(-10, 10, length.out = 500), 2),
fac1 = as.factor(rep(c("A", "B", "C"), length.out = 1000)),
fac2 = as.factor(rep(c("X", "Y"), each = 500)))
data$y <- data$x^2 + rnorm(nrow(data), sd = 5)
data$y[data$fac1 == "A"] <- sign(data$x[data$fac1 == "A"]) * data$y[data$fac1 == "A"] + 50
data$y[data$fac1 == "B"] <- datawizard::change_scale(data$y[data$fac1 == "B"]^3, c(-50, 100))
data$y[data$fac2 == "X" & data$fac1 == "C"] <- data$y[data$fac2 == "X" & data$fac1 == "C"] - 100
data$y[data$fac2 == "X" & data$fac1 == "B"] <- datawizard::change_scale(data$y[data$fac2 == "X" & data$fac1 == "B"] ^ 2, c(-50, 100))
data$y[data$fac2 == "X" & data$fac1 == "A"] <- datawizard::change_scale(data$y[data$fac2 == "X" & data$fac1 == "A"] * -3, c(0, 100))
# Real trends
ggplot(data, aes(x = x, y = y, color = fac1, shape = fac2)) +
geom_point()
# Two smooths
m <- mgcv::gam(y ~ fac1 * fac2 + s(x, by = fac1) + s(x, by = fac2), data = data)
plot(modelbased::estimate_relation(m, length = 100, preserve_range = F))
# Interaction
m <- mgcv::gam(y ~ fac1 * fac2 + s(x, by = interaction(fac1, fac2)), data = data)
plot(modelbased::estimate_relation(m, length = 100, preserve_range = F))
由 reprex package (v2.0.1)
于 2021-08-06 创建最后一个模型设法恢复了每个因素组合的趋势。