如何使用 gam() 拟合广义加性模型,其中始终将所有列用作预测变量(模型拟合中没有硬编码部分)

How to fit Generalized Additive Model with gam() where always all columns are used as predictors (no hard coding part in model fitting)

我在 R 中有一个训练数据 table,它总是有不同的列,例如现在数据 table 有以下列名称:

library(mgcv)
dt.train <- c("DE", "DEWind", "DESolar", "DEConsumption", "DETemperature", 
              "DENuclear", "DELignite")

现在我想用预测 DE 价格的集成平滑度估计来拟合广义加性模型 (= GAM)。目前我拟合的模型如下:

fitModel <- mgcv::gam(DE ~ s(DEWind)+s(DESolar)+s(DEConsumption)+s(DETemperature)+
                           s(DENuclear)+s(DELignite), 
                      data = dt.train)

列名目前是硬编码的,但我不想一直更改它,我想让程序识别有多少列并使模型与现有列相匹配。所以,我想要这样的东西(适用于 stats::lm()stats::glm()):

fitModel <- mgcv::gam(DE ~ .-1, data = dt.train)

不幸的是,这不适用于 gam()

出于统计原因,我不建议您这样做,但是……

nms <- c("DE", "DEWind", "DESolar", "DEConsumption", "DETemperature", 
              "DENuclear", "DELignite")
## typically you'd get those names as
## nms <- names(dt.tain)

## identify the response
resp <- 'DE'
## filter out response from `nms`
nms <- nms[nms != resp]

通过粘贴 s() 位,并连接由 +:

分隔的字符串,创建公式的右侧
rhs <- paste('s(', nms, ')', sep = '', collapse = ' + ')

这给了我们

> rhs
[1] "s(DEWind) + s(DESolar) + s(DEConsumption) + s(DETemperature) + s(DENuclear) + s(DELignite)"

然后你可以添加响应和~:

fml <- paste(resp, '~', rhs, collapse = ' ')

这给出了

> fml
[1] "DE ~ s(DEWind) + s(DESolar) + s(DEConsumption) + s(DETemperature) + s(DENuclear) + s(DELignite)"

最后强制转换为一个公式对象:

fml <- as.formula(fml)

这给出了

> fml
DE ~ s(DEWind) + s(DESolar) + s(DEConsumption) + s(DETemperature) + 
    s(DENuclear) + s(DELignite)