以编程方式通过 glm() 将 family= 传递给 step()

pass family= to step() via glm() programmatically

我试图通过模拟来演示不同模型的性能和特征选择技术,因此我希望以编程方式将各种参数传递给 glm()

?glm下我们读到(斜体是我的):

family: a description of the error distribution and link function to be used in the model. For glm this can be a character string naming a family function, a family function or the result of a call to a family function. For glm.fit only the third option is supported. (See family for details of family functions.)

问题是当我随后在生成的模型上调用 step() 时,似乎存在范围界定问题并且不再识别 family= 参数。

这是一个最小的例子:

getCoef <- function(formula, 
                family = c("gaussian", "binomial"),
                data){

  model_fam <- match.arg(family, c("gaussian", "binomial"))

  fit_null <- glm(update(formula,".~1"), 
                   family = model_fam, 
                   data = data)

  message("So far so good")

  fit_stepBIC <- step(fit_null, 
                      formula, 
                      direction="forward",
                      k = log(nrow(data)),
                      trace=0)

  message("Doesn't make it this far")

  fit_stepBIC$coefficients
}

# returns error 'model_fam' not found 
getCoef(Petal.Length ~ Petal.Width + Species, family = "gaussian", data = iris)

带回溯的错误消息:

> getCoef(Petal.Length ~ Petal.Width + Species, family = "gaussian", data = iris)
So far so good

 Error in stats::glm(formula = Petal.Length ~ Petal.Width + Species, family = model_fam,  : 
  object 'model_fam' not found 
9 stats::glm(formula = Petal.Length ~ Petal.Width + Species, family = model_fam, 
    data = data, method = "model.frame") 
8 eval(expr, envir, enclos) 
7 eval(fcall, env) 
6 model.frame.glm(fob, xlev = object$xlevels) 
5 model.frame(fob, xlev = object$xlevels) 
4 add1.glm(fit, scope$add, scale = scale, trace = trace, k = k, 
    ...) 
3 add1(fit, scope$add, scale = scale, trace = trace, k = k, ...) 
2 step(fit_null, formula, direction = "forward", k = log(nrow(data)), 
    trace = 0) 
1 getCoef(Petal.Length ~ Petal.Width + Species, family = "gaussian", 
    data = iris) 

> sessionInfo()
R version 3.2.4 Revised (2016-03-16 r70336)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] rsconnect_0.4.1.11 tools_3.2.4       

传递此参数以便被步骤识别的最自然方式是什么?我知道的一种可能的解决方法是通过以 model_fam.

为条件的 if-then-else 使用显式姓氏调用 glm()

我认为以下基于 evalbquote.() 的解决方案可能会解决您的问题。

我还安装了 R-version 3.2.4,我得到了与您从代码中得到的完全相同的错误。下面的解决方案使它在我的电脑上工作。

getCoef <- function(formula, 
                family = c("gaussian", "binomial"),
                data){

    model_fam <- match.arg(family, c("gaussian", "binomial"))

    fit_null <- eval(bquote(
        glm(update(.(formula),".~1"), 
            family = .(model_fam), 
            data = .(data))))

    message("So far so good")

    fit_stepBIC <- step(fit_null, 
                        formula, 
                        direction="forward",
                        k = log(nrow(data)),
                        trace=0)

    message("Doesn't make it this far")

    fit_stepBIC$coefficients
}

# returns error 'model_fam' not found 
 getCoef(formula = Petal.Length ~ Petal.Width + Species,
        family = "gaussian",
        data = iris)

So far so good
Doesn't make it this far
      (Intercept) Speciesversicolor  Speciesvirginica       Petal.Width 
         1.211397          1.697791          2.276693          1.018712   

问题是step最终调用了model.frame并且model.frame在一个特殊的环境,即定义公式的环境中计算terms对象。这通常是调用 getCoef 的环境。但是在这个环境中 model_fam 不存在,因为它是在 getCoef 中定义的。修复它的一种方法是添加

environment(formula) <- environment()

之后
model_fam <- match.arg(family, c("gaussian", "binomial"))

或类似的东西。