使用包内的自定义方差函数从 gaulss-gams 进行预测时的环境问题
Environmental problems while predicting from gaulss-gams with a custom variance function inside a package
未找到 R 包中可用的方差函数
predict
函数同时根据 gam-object
进行预测
先前构建 (mgcv 1.8-31)。
R 包应(除其他事项外)从 gam 对象进行预测。全部
以前构建的模型使用 gaulss 系列并有他们的
自己的方差函数。一些方差函数只是线性效应
一个变量,其他人使用更复杂的自定义函数。这
模型和方差函数存储在 'sysdata.rda' 文件中以
将它们包含在包中。该软件包记录在 devtools
和 roxygen2
.
考虑以下两个 GAM 的最小示例:
library(mgcv)
set.seed(123)
var.fun <- function(x){x^2}
x <- runif(100)
y <- x + rnorm(100, 0, var.fun(x))
mod.gam.1 <- gam(formula = list(y ~ x,
~ var.fun(x)),
family = gaulss(link = list("log", "logb")))
mod.gam.2 <- gam(formula = list(y ~ x,
~ I(x^2)),
family = gaulss(link = list("log", "logb")))
第一个模型使用自定义方差函数。第二个模型有
模型调用中硬编码的方差公式。
然后将模型和方差函数存储在 'sysdata.rda'
文件,它包含在名为 "gamvarfun" 的包中(我知道,命名
事情...),如下:
save(mod.gam.1, var.fun, mod.gam.2,
file = "~/gamvarfun/R/sysdata.rda")
现在包中添加了两个函数来从中检索预测
对应机型:
pred.fun.1 <- function(x){
predict(mod.gam.1,
newdata = data.frame("x" = x))
}
pred.fun.2 <- function(x){
predict(mod.gam.2,
newdata = data.frame("x" = x))
}
为了说明问题,我上传了一个非常基本的演示包
github,所以下面的代码应该可以工作:
library(devtools)
install_github("jan-schick/gam_issue")
library(gamvarfun)
pred.fun.2(1)
# 0.3135275 0.5443409
pred.fun.1(1)
# Error in var.fun(x) : could not find function "var.fun"
# Writing var.fun to global environment
var.fun <- gamvarfun:::var.fun
pred.fun.1(1)
# 0.3135275 0.5443409
使用pred.fun.1
(包含自定义方差函数)时,一个
显示错误。但是,pred.fun.2
(硬编码方差函数)
工作得很好。不会发生此错误,当
devtools::load_all()
用于代替 'proper' 安装
包裹。怀疑是环境不同导致的问题
使用 predict.gam
时。我通过编写自定义来测试这个假设
调用 pred.fun.1
之前对全局环境的方差函数
(见上文),这是有效的。然而,这显然不是一个解决方案
包。
在早期的尝试中,我试图在包内声明函数,
例如通过将上面编写的代码直接放在预测中
职能。安装包时也不起作用。
pred.fun.1 <- function(x){
var.fun <- function(x){x^2}
predict(mod.gam.1,
newdata = data.frame("x" = x))
}
我也试过 with
和 attach
在同一个地方(在
预测功能)没有任何成功。
我找到的唯一可行的解决方案是导出方差函数
并将其作为包的一部分 namespace/API。这是不可行的
在这种情况下的解决方案,因为它会导致许多可见的差异
功能,对用户没有实际好处。
然后有一个明显的解决方法:替换方差函数
通过模型调用中的原始公式,即使用 mod.gam.2
而不是 mod.gam.1
。但是,这不是一个合适的解决方案
要么。
请教了各种搜索引擎和同行无果。
因此,如能提供解决此问题的任何提示,我将不胜感激
问题
R version 3.6.2 (2019-12-12)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale:
[1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C
[3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8
[5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=de_DE.UTF-8
[7] LC_PAPER=de_DE.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] gamvarfun_0.1.0 usethis_1.5.0 devtools_2.0.2
loaded via a namespace (and not attached):
[1] Rcpp_1.0.2 rstudioapi_0.10 magrittr_1.5 splines_3.6.2
[5] pkgload_1.0.2 lattice_0.20-38 R6_2.4.0 rlang_0.4.0
[9] tools_3.6.2 grid_3.6.2 pkgbuild_1.0.3 nlme_3.1-143
[13] mgcv_1.8-31 sessioninfo_1.1.1 cli_1.1.0 withr_2.1.2
[17] remotes_2.0.4 assertthat_0.2.1 digest_0.6.20 rprojroot_1.3-2
[21] crayon_1.3.4 Matrix_1.2-18 processx_3.3.1 callr_3.2.0
[25] fs_1.3.1 ps_1.3.0 curl_4.0 testthat_2.1.1
[29] memoise_1.1.0 glue_1.3.1 compiler_3.6.2 desc_1.2.0
[33] backports_1.1.4 prettyunits_1.0.2
问题是公式有与之关联的环境,并且您正在保存与 globalenv()
的关联,但将函数放在其他地方。
如果您坚持使用 sysdata.rda
,这并不容易修复,因为 mod.gam.1
对象将该环境复制到多个位置。打补丁 environment(mod.gam.1$formula)
.
还不够
我认为唯一可行的方法是将生成 mod.gam.1
对象的源包含在包源中。如果公式是在从包命名空间继承的上下文中定义的,那么将找到在那里定义的方差函数。
编辑添加:我试过这个策略,但没能成功。 mgcv
包中似乎存在错误,特别是在解释 gualss
模型使用的特殊对偶公式的代码中。我会看看是否可以找到解决方法。
第二次编辑: 运行 下面的代码似乎修复了 mgcv
中存在错误的函数。它们基于 mgcv
版本 1.8-31。我不推荐 运行 这个脚本,除非你 完全 安装了那个版本。我已经向包维护者发送了一条消息;也许这些会被合并到未来的版本中。
interpret.gam0 <- function(gf,textra=NULL,extra.special=NULL)
# interprets a gam formula of the generic form:
# y~x0+x1+x3*x4 + s(x5)+ s(x6,x7) ....
# and returns:
# 1. a model formula for the parametric part: pf (and pfok indicating whether it has terms)
# 2. a list of descriptors for the smooths: smooth.spec
# this is function does the work, and is called by in interpret.gam
# 'textra' is optional text to add to term labels
# 'extra.special' is label of extra smooth within formula.
{ p.env <- environment(gf) # environment of formula
tf <- terms.formula(gf,specials=c("s","te","ti","t2",extra.special)) # specials attribute indicates which terms are smooth
terms <- attr(tf,"term.labels") # labels of the model terms
nt <- length(terms) # how many terms?
if (attr(tf,"response") > 0) { # start the replacement formulae
response <- as.character(attr(tf,"variables")[2])
} else {
response <- NULL
}
sp <- attr(tf,"specials")$s # array of indices of smooth terms
tp <- attr(tf,"specials")$te # indices of tensor product terms
tip <- attr(tf,"specials")$ti # indices of tensor product pure interaction terms
t2p <- attr(tf,"specials")$t2 # indices of type 2 tensor product terms
zp <- if (is.null(extra.special)) NULL else attr(tf,"specials")[[extra.special]]
off <- attr(tf,"offset") # location of offset in formula
## have to translate sp, tp, tip, t2p (zp) so that they relate to terms,
## rather than elements of the formula...
vtab <- attr(tf,"factors") # cross tabulation of vars to terms
if (length(sp)>0) for (i in 1:length(sp)) {
ind <- (1:nt)[as.logical(vtab[sp[i],])]
sp[i] <- ind # the term that smooth relates to
}
if (length(tp)>0) for (i in 1:length(tp)) {
ind <- (1:nt)[as.logical(vtab[tp[i],])]
tp[i] <- ind # the term that smooth relates to
}
if (length(tip)>0) for (i in 1:length(tip)) {
ind <- (1:nt)[as.logical(vtab[tip[i],])]
tip[i] <- ind # the term that smooth relates to
}
if (length(t2p)>0) for (i in 1:length(t2p)) {
ind <- (1:nt)[as.logical(vtab[t2p[i],])]
t2p[i] <- ind # the term that smooth relates to
}
if (length(zp)>0) for (i in 1:length(zp)) {
ind <- (1:nt)[as.logical(vtab[zp[i],])]
zp[i] <- ind # the term that smooth relates to
} ## re-referencing is complete
k <- kt <- kti <- kt2 <- ks <- kz <- kp <- 1 # counters for terms in the 2 formulae
len.sp <- length(sp)
len.tp <- length(tp)
len.tip <- length(tip)
len.t2p <- length(t2p)
len.zp <- length(zp)
ns <- len.sp + len.tp + len.tip + len.t2p + len.zp# number of smooths
pav <- av <- rep("",0)
smooth.spec <- list()
#mgcvat <- "package:mgcv" %in% search() ## is mgcv in search path?
mgcvns <- loadNamespace('mgcv')
if (nt) for (i in 1:nt) { # work through all terms
if (k <= ns&&((ks<=len.sp&&sp[ks]==i)||(kt<=len.tp&&tp[kt]==i)||(kz<=len.zp&&zp[kz]==i)||
(kti<=len.tip&&tip[kti]==i)||(kt2<=len.t2p&&t2p[kt2]==i))) { # it's a smooth
## have to evaluate in the environment of the formula or you can't find variables
## supplied as smooth arguments, e.g. k <- 5;gam(y~s(x,k=k)), fails,
## but if you don't specify namespace of mgcv then stuff like
## loadNamespace('mgcv'); k <- 10; mgcv::interpret.gam(y~s(x,k=k)) fails (can't find s)
## eval(parse(text=terms[i]),envir=p.env,enclos=loadNamespace('mgcv')) fails??
## following may supply namespace of mgcv explicitly if not on search path...
## If 's' etc are masked then we can fail even if mgcv on search path, hence paste
## of explicit mgcv reference into first attempt...
st <- try(eval(parse(text=paste("mgcv::",terms[i],sep="")),envir=p.env),silent=TRUE)
if (inherits(st,"try-error")) st <-
eval(parse(text=terms[i]),enclos=p.env,envir=mgcvns)
if (!is.null(textra)) { ## modify the labels on smooths with textra
pos <- regexpr("(",st$lab,fixed=TRUE)[1]
st$label <- paste(substr(st$label,start=1,stop=pos-1),textra,
substr(st$label,start=pos,stop=nchar(st$label)),sep="")
}
smooth.spec[[k]] <- st
if (ks<=len.sp&&sp[ks]==i) ks <- ks + 1 else # counts s() terms
if (kt<=len.tp&&tp[kt]==i) kt <- kt + 1 else # counts te() terms
if (kti<=len.tip&&tip[kti]==i) kti <- kti + 1 else # counts ti() terms
if (kt2<=len.t2p&&t2p[kt2]==i) kt2 <- kt2 + 1 # counts t2() terms
else kz <- kz + 1
k <- k + 1 # counts smooth terms
} else { # parametric
av[kp] <- terms[i] ## element kp on rhs of parametric
kp <- kp+1 # counts parametric terms
}
}
if (!is.null(off)) { ## deal with offset
av[kp] <- as.character(attr(tf,"variables")[1+off])
kp <- kp+1
}
pf <- paste(response,"~",paste(av,collapse=" + "))
if (attr(tf,"intercept")==0) {
pf <- paste(pf,"-1",sep="")
if (kp>1) pfok <- 1 else pfok <- 0
} else {
pfok <- 1;if (kp==1) {
pf <- paste(pf,"1");
}
}
fake.formula <- pf
if (length(smooth.spec)>0)
for (i in 1:length(smooth.spec)) {
nt <- length(smooth.spec[[i]]$term)
ff1 <- paste(smooth.spec[[i]]$term[1:nt],collapse="+")
fake.formula <- paste(fake.formula,"+",ff1)
if (smooth.spec[[i]]$by!="NA") {
fake.formula <- paste(fake.formula,"+",smooth.spec[[i]]$by)
av <- c(av,smooth.spec[[i]]$term,smooth.spec[[i]]$by)
} else av <- c(av,smooth.spec[[i]]$term)
}
fake.formula <- as.formula(fake.formula,p.env)
if (length(av)) {
pred.formula <- as.formula(paste("~",paste(av,collapse="+")))
pav <- all.vars(pred.formula) ## trick to strip out 'offset(x)' etc...
pred.formula <- reformulate(pav)
environment(pred.formula) <- environment(gf)
} else pred.formula <- ~1
ret <- list(pf=as.formula(pf,p.env),pfok=pfok,smooth.spec=smooth.spec,
fake.formula=fake.formula,response=response,fake.names=av,
pred.names=pav,pred.formula=pred.formula)
class(ret) <- "split.gam.formula"
ret
} ## interpret.gam0
interpret.gam <- function(gf,extra.special=NULL) {
## wrapper to allow gf to be a list of formulae or
## a single formula. This facilitates general penalized
## likelihood models in which several linear predictors
## may be involved...
##
## The list syntax is as follows. The first formula must have a response on
## the lhs, rather than labels. For m linear predictors, there
## must be m 'base formulae' in linear predictor order. lhs labels will
## be ignored in a base formula. Empty base formulae have '-1' on rhs.
## Further formulae have labels up to m labels 1,...,m on the lhs, in a
## syntax like this: 3 + 5 ~ s(x), which indicates that the same s(x)
## should be added to both linear predictors 3 and 5.
## e.g. A bivariate normal model with common expected values might be
## list(y1~-1,y2~-1,1+2~s(x)), whereas if the second component was contaminated
## by something else we might have list(y1~-1,y2~s(v)-1,1+2~s(x))
##
## For a list argument, this routine returns a list of split.formula objects
## with an extra field "lpi" indicating the linear predictors to which each
## contributes...
if (is.list(gf)) {
d <- length(gf)
## make sure all formulae have a response, to avoid
## problems with parametric sub formulae of the form ~1
#if (length(gf[[1]])<3) stop("first formula must specify a response")
resp <- gf[[1]][2]
ret <- list()
pav <- av <- rep("",0)
nlp <- 0 ## count number of linear predictors (may be different from number of formulae)
for (i in 1:d) {
textra <- if (i==1) NULL else paste(".",i-1,sep="") ## modify smooth labels to identify to predictor
lpi <- getNumericResponse(gf[[i]]) ## get linear predictors to which this applies, if explicit
if (length(lpi)==1) warning("single linear predictor indices are ignored")
if (length(lpi)>0) gf[[i]][[2]] <- NULL else { ## delete l.p. labels from formula response
nlp <- nlp + 1;lpi <- nlp ## this is base formula for l.p. number nlp
}
ret[[i]] <- interpret.gam0(gf[[i]],textra,extra.special=extra.special)
ret[[i]]$lpi <- lpi ## record of the linear predictors to which this applies
## make sure all parametric formulae have a response, to avoid
## problems with parametric sub formulae of the form ~1
respi <- rep("",0) ## no extra response terms
if (length(ret[[i]]$pf)==2) {
ret[[i]]$pf[3] <- ret[[i]]$pf[2];ret[[i]]$pf[2] <- resp
respi <- rep("",0)
} else if (i>1) respi <- ret[[i]]$response ## extra response terms
av <- c(av,ret[[i]]$fake.names,respi) ## accumulate all required variable names
pav <- c(pav,ret[[i]]$pred.names) ## predictors only
}
av <- unique(av) ## strip out duplicate variable names
pav <- unique(pav)
if (length(av)>0) {
## work around - reformulate with response = "log(x)" will treat log(x) as a name,
## not the call it should be...
fff <- formula(paste(ret[[1]]$response,"~ ."))
ret$fake.formula <- reformulate(av,response=ret[[1]]$response)
environment(ret$fake.formula) <- environment(gf[[1]])
ret$fake.formula[[2]] <- fff[[2]] ## fix messed up response
} else ret$fake.formula <- ret[[1]]$fake.formula ## create fake formula containing all variables
ret$pred.formula <- if (length(pav)>0) reformulate(pav) else ~1 ## predictor only formula
environment(ret$pred.formula) <- environment(gf[[1]])
ret$response <- ret[[1]]$response
ret$nlp <- nlp ## number of linear predictors
for (i in 1:d) if (max(ret[[i]]$lpi)>nlp||min(ret[[i]]$lpi)<1) stop("linear predictor labels out of range")
class(ret) <- "split.gam.formula"
return(ret)
} else interpret.gam0(gf,extra.special=extra.special)
} ## interpret.gam
## Now some test code.
environment(interpret.gam) <- environment(mgcv::interpret.gam)
environment(interpret.gam0) <- environment(mgcv:::interpret.gam0)
assignInNamespace("interpret.gam", interpret.gam, "mgcv")
assignInNamespace("interpret.gam0", interpret.gam0, "mgcv")
set.seed(123)
mod.gam.1 <- local({
var.fun <- function(x){x^2}
x <- runif(100)
y <- x + rnorm(100, 0, var.fun(x))
gam(formula = list(y ~ x,
~ var.fun(x)),
family = gaulss(link = list("log", "logb")))
})
pred.fun.1 <- function(x){
predict(mod.gam.1,
newdata = data.frame("x" = x))
}
pred.fun.1(1)
未找到 R 包中可用的方差函数
predict
函数同时根据 gam-object
进行预测
先前构建 (mgcv 1.8-31)。
R 包应(除其他事项外)从 gam 对象进行预测。全部
以前构建的模型使用 gaulss 系列并有他们的
自己的方差函数。一些方差函数只是线性效应
一个变量,其他人使用更复杂的自定义函数。这
模型和方差函数存储在 'sysdata.rda' 文件中以
将它们包含在包中。该软件包记录在 devtools
和 roxygen2
.
考虑以下两个 GAM 的最小示例:
library(mgcv)
set.seed(123)
var.fun <- function(x){x^2}
x <- runif(100)
y <- x + rnorm(100, 0, var.fun(x))
mod.gam.1 <- gam(formula = list(y ~ x,
~ var.fun(x)),
family = gaulss(link = list("log", "logb")))
mod.gam.2 <- gam(formula = list(y ~ x,
~ I(x^2)),
family = gaulss(link = list("log", "logb")))
第一个模型使用自定义方差函数。第二个模型有 模型调用中硬编码的方差公式。
然后将模型和方差函数存储在 'sysdata.rda' 文件,它包含在名为 "gamvarfun" 的包中(我知道,命名 事情...),如下:
save(mod.gam.1, var.fun, mod.gam.2,
file = "~/gamvarfun/R/sysdata.rda")
现在包中添加了两个函数来从中检索预测 对应机型:
pred.fun.1 <- function(x){
predict(mod.gam.1,
newdata = data.frame("x" = x))
}
pred.fun.2 <- function(x){
predict(mod.gam.2,
newdata = data.frame("x" = x))
}
为了说明问题,我上传了一个非常基本的演示包 github,所以下面的代码应该可以工作:
library(devtools)
install_github("jan-schick/gam_issue")
library(gamvarfun)
pred.fun.2(1)
# 0.3135275 0.5443409
pred.fun.1(1)
# Error in var.fun(x) : could not find function "var.fun"
# Writing var.fun to global environment
var.fun <- gamvarfun:::var.fun
pred.fun.1(1)
# 0.3135275 0.5443409
使用pred.fun.1
(包含自定义方差函数)时,一个
显示错误。但是,pred.fun.2
(硬编码方差函数)
工作得很好。不会发生此错误,当
devtools::load_all()
用于代替 'proper' 安装
包裹。怀疑是环境不同导致的问题
使用 predict.gam
时。我通过编写自定义来测试这个假设
调用 pred.fun.1
之前对全局环境的方差函数
(见上文),这是有效的。然而,这显然不是一个解决方案
包。
在早期的尝试中,我试图在包内声明函数, 例如通过将上面编写的代码直接放在预测中 职能。安装包时也不起作用。
pred.fun.1 <- function(x){
var.fun <- function(x){x^2}
predict(mod.gam.1,
newdata = data.frame("x" = x))
}
我也试过 with
和 attach
在同一个地方(在
预测功能)没有任何成功。
我找到的唯一可行的解决方案是导出方差函数 并将其作为包的一部分 namespace/API。这是不可行的 在这种情况下的解决方案,因为它会导致许多可见的差异 功能,对用户没有实际好处。
然后有一个明显的解决方法:替换方差函数
通过模型调用中的原始公式,即使用 mod.gam.2
而不是 mod.gam.1
。但是,这不是一个合适的解决方案
要么。
请教了各种搜索引擎和同行无果。
因此,如能提供解决此问题的任何提示,我将不胜感激 问题
R version 3.6.2 (2019-12-12)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale:
[1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C
[3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8
[5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=de_DE.UTF-8
[7] LC_PAPER=de_DE.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] gamvarfun_0.1.0 usethis_1.5.0 devtools_2.0.2
loaded via a namespace (and not attached):
[1] Rcpp_1.0.2 rstudioapi_0.10 magrittr_1.5 splines_3.6.2
[5] pkgload_1.0.2 lattice_0.20-38 R6_2.4.0 rlang_0.4.0
[9] tools_3.6.2 grid_3.6.2 pkgbuild_1.0.3 nlme_3.1-143
[13] mgcv_1.8-31 sessioninfo_1.1.1 cli_1.1.0 withr_2.1.2
[17] remotes_2.0.4 assertthat_0.2.1 digest_0.6.20 rprojroot_1.3-2
[21] crayon_1.3.4 Matrix_1.2-18 processx_3.3.1 callr_3.2.0
[25] fs_1.3.1 ps_1.3.0 curl_4.0 testthat_2.1.1
[29] memoise_1.1.0 glue_1.3.1 compiler_3.6.2 desc_1.2.0
[33] backports_1.1.4 prettyunits_1.0.2
问题是公式有与之关联的环境,并且您正在保存与 globalenv()
的关联,但将函数放在其他地方。
如果您坚持使用 sysdata.rda
,这并不容易修复,因为 mod.gam.1
对象将该环境复制到多个位置。打补丁 environment(mod.gam.1$formula)
.
我认为唯一可行的方法是将生成 mod.gam.1
对象的源包含在包源中。如果公式是在从包命名空间继承的上下文中定义的,那么将找到在那里定义的方差函数。
编辑添加:我试过这个策略,但没能成功。 mgcv
包中似乎存在错误,特别是在解释 gualss
模型使用的特殊对偶公式的代码中。我会看看是否可以找到解决方法。
第二次编辑: 运行 下面的代码似乎修复了 mgcv
中存在错误的函数。它们基于 mgcv
版本 1.8-31。我不推荐 运行 这个脚本,除非你 完全 安装了那个版本。我已经向包维护者发送了一条消息;也许这些会被合并到未来的版本中。
interpret.gam0 <- function(gf,textra=NULL,extra.special=NULL)
# interprets a gam formula of the generic form:
# y~x0+x1+x3*x4 + s(x5)+ s(x6,x7) ....
# and returns:
# 1. a model formula for the parametric part: pf (and pfok indicating whether it has terms)
# 2. a list of descriptors for the smooths: smooth.spec
# this is function does the work, and is called by in interpret.gam
# 'textra' is optional text to add to term labels
# 'extra.special' is label of extra smooth within formula.
{ p.env <- environment(gf) # environment of formula
tf <- terms.formula(gf,specials=c("s","te","ti","t2",extra.special)) # specials attribute indicates which terms are smooth
terms <- attr(tf,"term.labels") # labels of the model terms
nt <- length(terms) # how many terms?
if (attr(tf,"response") > 0) { # start the replacement formulae
response <- as.character(attr(tf,"variables")[2])
} else {
response <- NULL
}
sp <- attr(tf,"specials")$s # array of indices of smooth terms
tp <- attr(tf,"specials")$te # indices of tensor product terms
tip <- attr(tf,"specials")$ti # indices of tensor product pure interaction terms
t2p <- attr(tf,"specials")$t2 # indices of type 2 tensor product terms
zp <- if (is.null(extra.special)) NULL else attr(tf,"specials")[[extra.special]]
off <- attr(tf,"offset") # location of offset in formula
## have to translate sp, tp, tip, t2p (zp) so that they relate to terms,
## rather than elements of the formula...
vtab <- attr(tf,"factors") # cross tabulation of vars to terms
if (length(sp)>0) for (i in 1:length(sp)) {
ind <- (1:nt)[as.logical(vtab[sp[i],])]
sp[i] <- ind # the term that smooth relates to
}
if (length(tp)>0) for (i in 1:length(tp)) {
ind <- (1:nt)[as.logical(vtab[tp[i],])]
tp[i] <- ind # the term that smooth relates to
}
if (length(tip)>0) for (i in 1:length(tip)) {
ind <- (1:nt)[as.logical(vtab[tip[i],])]
tip[i] <- ind # the term that smooth relates to
}
if (length(t2p)>0) for (i in 1:length(t2p)) {
ind <- (1:nt)[as.logical(vtab[t2p[i],])]
t2p[i] <- ind # the term that smooth relates to
}
if (length(zp)>0) for (i in 1:length(zp)) {
ind <- (1:nt)[as.logical(vtab[zp[i],])]
zp[i] <- ind # the term that smooth relates to
} ## re-referencing is complete
k <- kt <- kti <- kt2 <- ks <- kz <- kp <- 1 # counters for terms in the 2 formulae
len.sp <- length(sp)
len.tp <- length(tp)
len.tip <- length(tip)
len.t2p <- length(t2p)
len.zp <- length(zp)
ns <- len.sp + len.tp + len.tip + len.t2p + len.zp# number of smooths
pav <- av <- rep("",0)
smooth.spec <- list()
#mgcvat <- "package:mgcv" %in% search() ## is mgcv in search path?
mgcvns <- loadNamespace('mgcv')
if (nt) for (i in 1:nt) { # work through all terms
if (k <= ns&&((ks<=len.sp&&sp[ks]==i)||(kt<=len.tp&&tp[kt]==i)||(kz<=len.zp&&zp[kz]==i)||
(kti<=len.tip&&tip[kti]==i)||(kt2<=len.t2p&&t2p[kt2]==i))) { # it's a smooth
## have to evaluate in the environment of the formula or you can't find variables
## supplied as smooth arguments, e.g. k <- 5;gam(y~s(x,k=k)), fails,
## but if you don't specify namespace of mgcv then stuff like
## loadNamespace('mgcv'); k <- 10; mgcv::interpret.gam(y~s(x,k=k)) fails (can't find s)
## eval(parse(text=terms[i]),envir=p.env,enclos=loadNamespace('mgcv')) fails??
## following may supply namespace of mgcv explicitly if not on search path...
## If 's' etc are masked then we can fail even if mgcv on search path, hence paste
## of explicit mgcv reference into first attempt...
st <- try(eval(parse(text=paste("mgcv::",terms[i],sep="")),envir=p.env),silent=TRUE)
if (inherits(st,"try-error")) st <-
eval(parse(text=terms[i]),enclos=p.env,envir=mgcvns)
if (!is.null(textra)) { ## modify the labels on smooths with textra
pos <- regexpr("(",st$lab,fixed=TRUE)[1]
st$label <- paste(substr(st$label,start=1,stop=pos-1),textra,
substr(st$label,start=pos,stop=nchar(st$label)),sep="")
}
smooth.spec[[k]] <- st
if (ks<=len.sp&&sp[ks]==i) ks <- ks + 1 else # counts s() terms
if (kt<=len.tp&&tp[kt]==i) kt <- kt + 1 else # counts te() terms
if (kti<=len.tip&&tip[kti]==i) kti <- kti + 1 else # counts ti() terms
if (kt2<=len.t2p&&t2p[kt2]==i) kt2 <- kt2 + 1 # counts t2() terms
else kz <- kz + 1
k <- k + 1 # counts smooth terms
} else { # parametric
av[kp] <- terms[i] ## element kp on rhs of parametric
kp <- kp+1 # counts parametric terms
}
}
if (!is.null(off)) { ## deal with offset
av[kp] <- as.character(attr(tf,"variables")[1+off])
kp <- kp+1
}
pf <- paste(response,"~",paste(av,collapse=" + "))
if (attr(tf,"intercept")==0) {
pf <- paste(pf,"-1",sep="")
if (kp>1) pfok <- 1 else pfok <- 0
} else {
pfok <- 1;if (kp==1) {
pf <- paste(pf,"1");
}
}
fake.formula <- pf
if (length(smooth.spec)>0)
for (i in 1:length(smooth.spec)) {
nt <- length(smooth.spec[[i]]$term)
ff1 <- paste(smooth.spec[[i]]$term[1:nt],collapse="+")
fake.formula <- paste(fake.formula,"+",ff1)
if (smooth.spec[[i]]$by!="NA") {
fake.formula <- paste(fake.formula,"+",smooth.spec[[i]]$by)
av <- c(av,smooth.spec[[i]]$term,smooth.spec[[i]]$by)
} else av <- c(av,smooth.spec[[i]]$term)
}
fake.formula <- as.formula(fake.formula,p.env)
if (length(av)) {
pred.formula <- as.formula(paste("~",paste(av,collapse="+")))
pav <- all.vars(pred.formula) ## trick to strip out 'offset(x)' etc...
pred.formula <- reformulate(pav)
environment(pred.formula) <- environment(gf)
} else pred.formula <- ~1
ret <- list(pf=as.formula(pf,p.env),pfok=pfok,smooth.spec=smooth.spec,
fake.formula=fake.formula,response=response,fake.names=av,
pred.names=pav,pred.formula=pred.formula)
class(ret) <- "split.gam.formula"
ret
} ## interpret.gam0
interpret.gam <- function(gf,extra.special=NULL) {
## wrapper to allow gf to be a list of formulae or
## a single formula. This facilitates general penalized
## likelihood models in which several linear predictors
## may be involved...
##
## The list syntax is as follows. The first formula must have a response on
## the lhs, rather than labels. For m linear predictors, there
## must be m 'base formulae' in linear predictor order. lhs labels will
## be ignored in a base formula. Empty base formulae have '-1' on rhs.
## Further formulae have labels up to m labels 1,...,m on the lhs, in a
## syntax like this: 3 + 5 ~ s(x), which indicates that the same s(x)
## should be added to both linear predictors 3 and 5.
## e.g. A bivariate normal model with common expected values might be
## list(y1~-1,y2~-1,1+2~s(x)), whereas if the second component was contaminated
## by something else we might have list(y1~-1,y2~s(v)-1,1+2~s(x))
##
## For a list argument, this routine returns a list of split.formula objects
## with an extra field "lpi" indicating the linear predictors to which each
## contributes...
if (is.list(gf)) {
d <- length(gf)
## make sure all formulae have a response, to avoid
## problems with parametric sub formulae of the form ~1
#if (length(gf[[1]])<3) stop("first formula must specify a response")
resp <- gf[[1]][2]
ret <- list()
pav <- av <- rep("",0)
nlp <- 0 ## count number of linear predictors (may be different from number of formulae)
for (i in 1:d) {
textra <- if (i==1) NULL else paste(".",i-1,sep="") ## modify smooth labels to identify to predictor
lpi <- getNumericResponse(gf[[i]]) ## get linear predictors to which this applies, if explicit
if (length(lpi)==1) warning("single linear predictor indices are ignored")
if (length(lpi)>0) gf[[i]][[2]] <- NULL else { ## delete l.p. labels from formula response
nlp <- nlp + 1;lpi <- nlp ## this is base formula for l.p. number nlp
}
ret[[i]] <- interpret.gam0(gf[[i]],textra,extra.special=extra.special)
ret[[i]]$lpi <- lpi ## record of the linear predictors to which this applies
## make sure all parametric formulae have a response, to avoid
## problems with parametric sub formulae of the form ~1
respi <- rep("",0) ## no extra response terms
if (length(ret[[i]]$pf)==2) {
ret[[i]]$pf[3] <- ret[[i]]$pf[2];ret[[i]]$pf[2] <- resp
respi <- rep("",0)
} else if (i>1) respi <- ret[[i]]$response ## extra response terms
av <- c(av,ret[[i]]$fake.names,respi) ## accumulate all required variable names
pav <- c(pav,ret[[i]]$pred.names) ## predictors only
}
av <- unique(av) ## strip out duplicate variable names
pav <- unique(pav)
if (length(av)>0) {
## work around - reformulate with response = "log(x)" will treat log(x) as a name,
## not the call it should be...
fff <- formula(paste(ret[[1]]$response,"~ ."))
ret$fake.formula <- reformulate(av,response=ret[[1]]$response)
environment(ret$fake.formula) <- environment(gf[[1]])
ret$fake.formula[[2]] <- fff[[2]] ## fix messed up response
} else ret$fake.formula <- ret[[1]]$fake.formula ## create fake formula containing all variables
ret$pred.formula <- if (length(pav)>0) reformulate(pav) else ~1 ## predictor only formula
environment(ret$pred.formula) <- environment(gf[[1]])
ret$response <- ret[[1]]$response
ret$nlp <- nlp ## number of linear predictors
for (i in 1:d) if (max(ret[[i]]$lpi)>nlp||min(ret[[i]]$lpi)<1) stop("linear predictor labels out of range")
class(ret) <- "split.gam.formula"
return(ret)
} else interpret.gam0(gf,extra.special=extra.special)
} ## interpret.gam
## Now some test code.
environment(interpret.gam) <- environment(mgcv::interpret.gam)
environment(interpret.gam0) <- environment(mgcv:::interpret.gam0)
assignInNamespace("interpret.gam", interpret.gam, "mgcv")
assignInNamespace("interpret.gam0", interpret.gam0, "mgcv")
set.seed(123)
mod.gam.1 <- local({
var.fun <- function(x){x^2}
x <- runif(100)
y <- x + rnorm(100, 0, var.fun(x))
gam(formula = list(y ~ x,
~ var.fun(x)),
family = gaulss(link = list("log", "logb")))
})
pred.fun.1 <- function(x){
predict(mod.gam.1,
newdata = data.frame("x" = x))
}
pred.fun.1(1)