使用 reformulate 合并非标准公式

using reformulate to merge non-standard formula

给定:

vars <- c("var1", "var2", "var3", "var4")
mm_exp <- expression(
  f(testm, testmodel = 'test', testgraph = g, truetest1 = TRUE, truetest2 = TRUE), 
  f(testm2, testmodel = 'fg'),
  f(testm3, testmodel = 'fg3')
)

我想使用 mm_expvars 的所有组合(combn)生成 formulas 以输入模型:

#y ~ var1 + var2 + var3 + var4 + f(testm, testmodel = "test", testgraph = g, truetest1 = TRUE, 
#  truetest2 = TRUE) + f(testm2, testmodel = "fg") + f(testm3, testmodel = 'fg3')

#y ~ var1 + var2 + var3 + f(testm, testmodel = "test", testgraph = g, truetest1 = TRUE, 
#  truetest2 = TRUE) + f(testm2, testmodel = "fg") + f(testm3, testmodel = 'fg3')

#y ~ var1 + var2 + f(testm, testmodel = "test", testgraph = g, truetest1 = TRUE, 
#  truetest2 = TRUE) + f(testm2, testmodel = "fg") + f(testm3, testmodel = 'fg3')

#y ~ var1 + var4 + f(testm, testmodel = "test", testgraph = g, truetest1 = TRUE, 
#  truetest2 = TRUE) + f(testm2, testmodel = "fg") + f(testm3, testmodel = 'fg3')


#etc.....

如果我简化 mm_exp,我可以使用 reformulate 得到类似于我想要的东西(暂时忽略 combn):

mm_exp_simplify <- expression(
  f(testm, testmodel = 'test', testgraph = g), 
  f(testm2, testmodel = 'fg'),
  f(testm3, testmodel = 'fg3')
)
reformulate(c(vars, sapply(mm_exp_simplify, deparse)), "y")
# y ~ var1 + var2 + var3 + var4 + f(testm, testmodel = "test", 
#     testgraph = g) + f(testm2, testmodel = "fg") + f(testm3, 
#     testmodel = "fg3")

但是如果我在 truetest1 = TRUE, truetest2 = TRUE 中重新添加它会导致问题:

mm_exp <- expression(
  f(testm, testmodel = 'test', testgraph = g, truetest1 = TRUE, truetest2 = TRUE), 
  f(testm2, testmodel = 'fg'),
  f(testm3, testmodel = 'fg3')
)
reformulate(c(vars, sapply(mm_exp, deparse)), "y")
# Error in reformulate(c(vars, sapply(mm_exp, deparse)), "y") : 
#   'termlabels' must be a character vector of length at least one

我也试过使用 quote 但遇到了类似的问题:

mm_quote <- quote(
  f(testm, testmodel = 'test', testgraph = g, truetest1 = TRUE, truetest2 = TRUE) + 
    f(testm2, testmodel = 'fg') + f(testm3, testmodel = 'fg3')
)
as.formula(paste0("y ~ ", paste(paste(vars, collapse = "+"), deparse(mm_quote), sep = "+")))
# Error in parse(text = x, keep.source = FALSE) : 
#   <text>:2:39: unexpected '='
# 1: y ~ var1+var2+var3+var4+f(testm, testmodel = "test", testgraph = g, truetest1 = TRUE, 
# 2: y ~ var1+var2+var3+var4+    truetest2 =
#                                          ^

有人对如何包含 truetest1 = TRUE, truetest2 = TRUE 以及如何获取公式的 combn 版本有建议吗?

谢谢

解决方案

要解决第一个问题,您需要使用deparse1而不是deparse。像这样:

reformulate(c(vars, sapply(mm_exp, deparse1)), "y")
#> y ~ var1 + var2 + var3 + var4 + f(testm, testmodel = "test", 
#>     testgraph = g, truetest1 = TRUE, truetest2 = TRUE) + f(testm2, 
#>     testmodel = "fg") + f(testm3, testmodel = "fg3")

关于你的第二个问题,首先你可以创建所有可能长度的组合,然后你可以这样创建所有公式的列表:

# all vars combinations 
vars_comb <- lapply(seq_along(vars), function(n) combn(vars, n, simplify = FALSE))
vars_comb <- unlist(vars_comb, recursive = FALSE)

# all formulas
lapply(vars_comb, function(v) reformulate(c(v, sapply(mm_exp, deparse1)), "y"))

为什么

其背后的原因与参数width.cutoff的默认值有关,即deparse中的width.cutoff = 60Ldeparse1中的width.cutoff = 500L

看看这个:

# output with deparse
deparse(expression(f(testm, testmodel = 'test', testgraph = g, truetest1 = TRUE, truetest2 = TRUE)))
#> [1] "expression(f(testm, testmodel = \"test\", testgraph = g, truetest1 = TRUE, "
#> [2] "    truetest2 = TRUE))"

# output with deparse and width.cutoff forced to 500
deparse(expression(f(testm, testmodel = 'test', testgraph = g, truetest1 = TRUE, truetest2 = TRUE)), 
        width.cutoff = 500)
#> [1] "expression(f(testm, testmodel = \"test\", testgraph = g, truetest1 = TRUE, truetest2 = TRUE))"

# output with deparse1
deparse1(expression(f(testm, testmodel = 'test', testgraph = g, truetest1 = TRUE, truetest2 = TRUE)))
#> [1] "expression(f(testm, testmodel = \"test\", testgraph = g, truetest1 = TRUE, truetest2 = TRUE))"

第一个 deparse 创建了一个长度为 2 的向量,它干扰了 reformulate,因为它创建了不合规的公式组件。


对于 R < 4.0

如果您像评论中所说的那样拥有 R 3.6,则 deparse1 不可用。 因此你需要在 deparse.

里面设置 width.cutoff = 500L

解决方案如下所示:

# first issue
reformulate(c(vars, sapply(mm_exp, deparse, width.cutoff = 500L)), "y")

# second issue
vars_comb <- lapply(seq_along(vars), function(n) combn(vars, n, simplify = FALSE))
vars_comb <- unlist(vars_comb, recursive = FALSE)
lapply(vars_comb, function(v) reformulate(c(v, sapply(mm_exp, deparse, width.cutoff = 500L)), "y"))