函数内 ggplot2 的惰性评估

Question

我主要 ggplot2 用于可视化。通常，我设计情节交互地（即使用 NSE 的原始 ggplot2 代码）但最后，我经常最终将该代码包装到一个函数中，该函数接收要绘制的数据和变量。这总是有点噩梦。

所以，典型的情况是这样的。我有一些数据，我为它创建一个图（在这种情况下，一个非常非常简单的例子，使用 ggplot2).

自带的mpg数据集

library(ggplot2)
data(mpg)

ggplot(data = mpg, 
       mapping = aes(x = class, y = hwy)) +
    geom_boxplot() + 
    geom_jitter(alpha = 0.1, color = "blue")

当我完成情节设计时，我通常想将其用于不同的变量或数据等。所以我创建了一个函数来接收绘图的数据和变量作为参数。但是由于 NSE，它是不像写函数头然后 copy/paste 和替换那么容易函数参数的变量。那是行不通的，如下所示。

mpg <- mpg
plotfn <- function(data, xvar, yvar){
    ggplot(data = data, 
           mapping = aes(x = xvar, y = yvar)) +
    geom_boxplot() + 
    geom_jitter(alpha = 0.1, color = "blue")
}
plotfn(mpg, class, hwy) # Can't find object

## Don't know how to automatically pick scale for object of type function. Defaulting to continuous.

## Warning: restarting interrupted promise evaluation

## Error in eval(expr, envir, enclos): object 'hwy' not found

plotfn(mpg, "class", "hwy") #

所以我必须返回并修复代码，例如，使用 aes_string 取而代之的是使用 NSE 的 aes（在这个例子中它很容易，但是对于更复杂的图，有很多转换和层，这变成了一场噩梦。

plotfn <- function(data, xvar, yvar){
    ggplot(data = data, 
           mapping = aes_string(x = xvar, y = yvar)) +
    geom_boxplot() + 
    geom_jitter(alpha = 0.1, color = "blue")
}
plotfn(mpg, "class", "hwy") # Now this works

问题是我发现 NSE 和 lazyeval 非常方便。所以我喜欢做这样的事情。

mpg <- mpg
plotfn <- function(data, xvar, yvar){
    data_gd <- data.frame(
        xvar = lazyeval::lazy_eval(substitute(xvar), data = data),
        yvar = lazyeval::lazy_eval(substitute(yvar), data = data))

    ggplot(data = data_gd, 
           mapping = aes(x = xvar, y = yvar)) +
    geom_boxplot() + 
    geom_jitter(alpha = 0.1, color = "blue")
}
plotfn(mpg, class, hwy) # Now this works

plotfn(mpg, "class", "hwy") # This still works

plotfn(NULL, rep(letters[1:4], 250), 1:100) # And even this crazyness works

这为我的绘图函数提供了很大的灵活性。例如，您可以直接传递带引号或不带引号的变量名，甚至数据而不是变量名（一种对惰性求值的滥用）。

但这有一个很大的问题。功能无法使用以编程方式。

dynamically_changing_xvar <- "class"
plotfn(mpg, dynamically_changing_xvar, hwy) 

## Error in eval(expr, envir, enclos): object 'dynamically_changing_xvar' not found

# This does not work, because it never finds the object 
# dynamically_changing_xvar in the data, and it does not get evaluated to 
# obtain the variable name (class)

所以我不能使用循环（例如lapply）为变量或数据的不同组合。

所以我想滥用更多的惰性、标准和非标准评估，并尝试将它们全部结合起来，这样我就有了灵活性如上所示以及以编程方式使用该功能的能力。基本上，我所做的是先使用 tryCatch lazy_eval 每个变量的表达式，如果失败，则评估解析的表达式。

plotfn <- function(data, xvar, yvar){
    data_gd <- NULL
    data_gd$xvar <- tryCatch(
        expr = lazyeval::lazy_eval(substitute(xvar), data = data),
        error = function(e) eval(envir = data, expr = parse(text=xvar))
    )
    data_gd$yvar <- tryCatch(
        expr = lazyeval::lazy_eval(substitute(yvar), data = data),
        error = function(e) eval(envir = data, expr = parse(text=yvar))
    )


    ggplot(data = as.data.frame(data_gd), 
           mapping = aes(x = xvar, y = yvar)) +
    geom_boxplot() + 
    geom_jitter(alpha = 0.1, color = "blue")
}

plotfn(mpg, class, hwy) # Now this works, again

plotfn(mpg, "class", "hwy") # This still works, again

plotfn(NULL, rep(letters[1:4], 250), 1:100) # And this crazyness still works

# And now, I can also pass a local variable to the function, that contains
# the name of the variable that I want to plot
dynamically_changing_xvar <- "class"
plotfn(mpg, dynamically_changing_xvar, hwy)

所以，除了前面提到的灵活性，现在我可以使用一行左右，产生许多相同的情节，不同的变量（或数据）。

lapply(c("class", "fl", "drv"), FUN = plotfn, yvar = hwy, data = mpg)

## [[1]]

## 
## [[2]]

## 
## [[3]]

尽管它非常实用，但我怀疑这不是好的做法。但这是多么糟糕的做法？这是我的关键问题。还有什么选择我可以使用两全其美吗？

当然，我可以看到这种模式会产生问题。例如。

# If I have a variable in the global environment that contains the variable
# I want to plot, but whose name is in the data passed to the function, 
# then it will use the name of the variable and not its content
drv <- "class"
plotfn(mpg, drv, hwy) # Here xvar on the plot is drv and not class

还有一些（很多？）其他问题。但在我看来语法灵活性胜过其他问题。对此有什么想法吗？

Answer 1

为清楚起见，提取您建议的函数：

library(ggplot2)
data(mpg)

plotfn <- function(data, xvar, yvar){
  data_gd <- NULL
  data_gd$xvar <- tryCatch(
    expr = lazyeval::lazy_eval(substitute(xvar), data = data),
    error = function(e) eval(envir = data, expr = parse(text=xvar))
  )
  data_gd$yvar <- tryCatch(
    expr = lazyeval::lazy_eval(substitute(yvar), data = data),
    error = function(e) eval(envir = data, expr = parse(text=yvar))
  )

  ggplot(data = as.data.frame(data_gd), 
         mapping = aes(x = xvar, y = yvar)) +
    geom_boxplot() + 
    geom_jitter(alpha = 0.1, color = "blue")
}

这样的功能通常非常有用，因为您可以自由混合字符串和裸变量名。但正如你所说，它可能并不总是安全的。考虑以下人为的示例：

class <- "drv"
Class <- "drv"
plotfn(mpg, class, hwy) 
plotfn(mpg, Class, hwy)

你的函数会生成什么？这些是否相同（它们不同）？我不太清楚结果会是什么。使用这样的函数编程可能会产生意想不到的结果，这取决于 data 中存在哪些变量以及环境中存在哪些变量。由于很多人使用像 x、xvar 或 count 这样的变量名（尽管他们可能不应该使用），事情会变得一团糟。

另外，如果我想强加对 class 的一种或另一种解释，我做不到。

我会说它有点类似于使用 attach：方便，但在某些时候它可能会咬你的屁股。

因此，我会使用 NSE 和 SE 对：

plotfn <- function(data, xvar, yvar) {
  plotfn_(data,
          lazyeval::lazy_eval(xvar, data = data),
          lazyeval::lazy_eval(yvar, data = data))
  )
}

plotfn_ <- function(data, xvar, yvar){
  ggplot(data = data, 
         mapping = aes_(x = xvar, y = yvar)) +
    geom_boxplot() + 
    geom_jitter(alpha = 0.1, color = "blue")
}

我认为创建这些实际上比您的函数更容易。您也可以选择使用 lazy_dots 延迟捕获所有参数。

现在我们在使用安全的 SE 版本时更容易预测结果：

class <- "drv"
Class <- "drv"
plotfn_(mpg, class, 'hwy')
plotfn_(mpg, Class, 'hwy')

NSE 版本仍然受到影响：

plotfn(mpg, class, hwy)
plotfn(mpg, Class, hwy)

（我觉得 ggplot2::aes_ 不接受字符串有点烦人。）

函数内 ggplot2 的惰性评估

Lazy evaluation for ggplot2 inside a function

r

lazy-evaluation

ggplot2