is.data.frame(data) 对象 ... 在函数上下文中找不到

is.data.frame(data) object ... not found in function context

在 R:

中使用来自用户定义函数的包 tree cv.tree 函数有一个奇怪的问题
func <- function(train) {
    classification.tree <- tree(log10(perf) ~ syct+mmin+mmax+cach+chmin+chmax, train, split = "gini")
    cv.tree(classification.tree, ,FUN=prune.tree, K = 4)
    return (classification.tree)
}
data(cpus, package="MASS")
result <- func(cpus)
plot(result)

这会产生错误:

Error in is.data.frame(data) : object 'train' not found 
16 is.data.frame(data) 
15 model.frame.default(formula = log10(perf) ~ syct + mmin + mmax + 
    cach + chmin + chmax, data = train, subset = c("1", "2", 
"3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", 
"15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25",  ... 
14 eval(expr, envir, enclos) 
13 eval(expr, p) 
12 eval.parent(m) 
11 tree(formula = log10(perf) ~ syct + mmin + mmax + cach + chmin + 
    chmax, data = train, split = "gini", subset = c("1", "2", 
"3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", 
"15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25",  ... 
10 eval(expr, envir, enclos) 
9 eval(oc) 
8 model.frame.tree(object) 
7 model.frame(object) 
6 cv.tree(classification.tree, , FUN = prune.tree, K = 4) at .active-rstudio-document#4
5 func(cpus) at .active-rstudio-document#9
4 eval(expr, envir, enclos) 
3 eval(ei, envir) 
2 withVisible(eval(ei, envir)) 
1 source("~/.active-rstudio-document", echo = TRUE) 

同时,如果我直接从脚本中调用相同的代码,它就可以正常工作:

data(cpus, package="MASS")
classification.tree <- tree(log10(perf) ~ syct+mmin+mmax+cach+chmin+chmax, cpus, split = "gini")
cv.tree(classification.tree, ,FUN=prune.tree, K = 4)
plot(classification.tree)

我错过了什么?

崩溃发生在 cv.tree() 调用中。 更新: cv.tree 调用 model.frame ,并且在该函数内部有一个 eval,但是变量 train 在该函数的环境中不存在。

我会继续挖掘....如果我深入 model.frame 的调试模式并将 'object' 的 data 列表元素从 'train' 更改到'cpus',然后eval找到对象并执行。

Annnnd:我回到了起点。是环境和惰性评价问题。
解决方法是使用 force :

func <- function(train) {
    force(train)
    classification.tree <- tree(log10(perf) ~ syct+mmin+mmax+cach+chmin+chmax, train, split = "gini")
    cv.tree(classification.tree, FUN=prune.tree, K = 4)
    return (classification.tree)
}

这使得 "train" 存在于 cv.tree 及其调用的函数可用的环境中。环境会变得很奇怪 :-) ;这是其中的一个例子。

原来这个特定的包要求数据集是一个全局变量!将此作为使用通用编程语言的论据怎么样?无论如何,下面的代码似乎有效:

library(tree)

train_global <- NA
func <- function(train) {
  train_global <<- train
  t <- tree(formula=log10(perf) ~ syct+mmin+mmax+cach+chmin+chmax, data=train_global, split = "gini")    
  cv.tree(t, ,FUN=prune.tree, K = 4)    
  return (t)    
}    
data(cpus, package="MASS")    
cpus <- cpus    
result <- func(cpus)
plot(result)