为什么 gam::step.gam 在向前选择时返回 NULL?

Why is gam::step.gam returning NULL with forward selection?

我有一个大型广义加性模型 (GAM),它由 10K 个观测值和约 100 个变量组成。使用前向逐步选择构建模型会得到 class "NULL" 的对象。为什么会这样,我该如何解决?

library(gam)

load(url("https://github.com/cornejom/DataSets/raw/master/mydata.Rdata"))
load(url("https://github.com/cornejom/DataSets/raw/master/mygam.Rdata"))

myscope <- gam.scope(mydata, response = 3, arg = "df=4") #Target var in 3rd col.
mygam.step <- step.gam(mygam, myscope, direction = "forward")

mygam.step
NULL

用于从 mydata 匹配 mygam 的代码是:

library(gam)

#Identify numerical variables, but exclude the integer response.
numbers = sapply(mydata, class) %in% c("integer", "numeric")  
numbers[match("Response", names(mydata))] = FALSE 

#Identify factor variables.
factors = sapply(mydata, class) == "factor"

#Create a formula to feed into gam function.
myformula = paste0(paste0("Response ~ ", 
                          paste0("s(", names(mydata)[numbers], ", df=4)", collapse = " + ")
                          ),
                   " + ",
                   paste0(paste0(names(mydata)[factors], collapse = " + ")))

mygam = gam(as.formula(myformula), family = "binomial", mydata)

我怀疑问题出在 mygam 对象上。

说明

如果你阅读 help(step.gam) 它在 scope 参数的解释中有这段:

The supplied model ‘object’ is used as the starting model, and hence there is the requirement that one term from each of the term formulas be present in ‘formula(object)’. This also implies that any terms in ‘formula(object)’ not contained in any of the term formulas will be forced to be present in every model considered. The function ‘gam.scope’ is helpful for generating the scope argument for a large model.

本质上,这表示传递给 step.gam 函数的第一个参数(在本例中为 mygam)将有一个公式,并且该公式将用作逐步过程的起始模型。

因为这里我们有 forward stepwise - 它不能从完整模型开始,因为那样的话就没有什么可以添加了。

探索代码

如果我们看一下代码,这个想法就会得到加强。 step.gam 函数的代码有这个循环,在 forward 选择的情况下运行。

if (forward) {
    trial <- items
    trial[i] <- trial[i] + 1
    if (trial[i] <= term.lengths[i] && !get.visit(trial,
      visited)) {
      visited <- cbind(visited, trial)
      tform.vector <- form.vector
      tform.vector[i] <- scope[[i]][trial[i]]
      form.list = c(form.list, list(list(trial = trial,
        form.vector = tform.vector, which = i)))
    }
}

请注意,只有当内部 if 语句为 TRUE 时,循环才会执行。并且 if 语句似乎检查您的作用域 (term.length) 中是否存在您的模型 (itemstrial) 中尚不存在的潜在变量。如果你不这样做 - 循环会跳过。

由于在您的情况下循环永远不会执行,因此它不会形成 return 对象和过程 returns NULL。

解决方案

鉴于以上所有 - 解决方案是在使用 forward 选择方法时不要从完整的公式开始。这里为了演示,我将使用 intercept-only 模型作为起始模型:

library(gam)
load(url("https://github.com/cornejom/DataSets/raw/master/mydata.Rdata"))
mygam <- gam(Response ~ 1, family = "binomial", mydata)

最后一行是唯一需要更改的地方。其他都和原来的一样post:

myscope <- gam.scope(mydata, response = 3, arg = "df=4")
mygam.step <- step.gam(mygam, myscope, direction = "forward")

现在该程序有效了。