ggplot2 自定义统计数据在分面时未显示

Question

我正在尝试为 ggplot2 编写自定义 stat_*，我想在其中使用瓷砖为 2D 黄土表面着色。当我 start from the extension guide 时，我可以像他们那样写一个 stat_chull：

stat_chull = function(mapping = NULL, data = NULL, geom = "polygon",
                       position = "identity", na.rm = FALSE, show.legend = NA, 
                       inherit.aes = TRUE, ...) {

  chull = ggproto("chull", Stat,
    compute_group = function(data, scales) {
      data[chull(data$x, data$y), , drop = FALSE]
    },
    required_aes = c("x", "y")
  )

  layer(
    stat = chull, data = data, mapping = mapping, geom = geom, 
    position = position, show.legend = show.legend, inherit.aes = inherit.aes,
    params = list(na.rm = na.rm, ...)
  )
}

这对简单调用和 facet 包装都有效：

ggplot(mpg, aes(x=displ, y=hwy)) + 
  geom_point() + 
  stat_chull()
# optionally + facet_wrap(~ class)

当我写我的 stat_loess2d 时，我还可以想象所有 classes 或个人 class:

stat_loess2d = function(mapping = NULL, data = NULL, geom = "tile",
                       position = "identity", na.rm = FALSE, show.legend = NA, 
                       inherit.aes = TRUE, ...) {

  loess2d = ggproto("loess2d", Stat,
    compute_group = function(data, scales) {
      dens = MASS::kde2d(data$x, data$y)
      lsurf = loess(fill ~ x + y, data=data)
      df = data.frame(x = rep(dens$x, length(dens$y)),
                      y = rep(dens$y, each=length(dens$x)),
                      dens = c(dens$z))
      df$fill = predict(lsurf, newdata=df[c("x", "y")])
      df
    },
    required_aes = c("x", "y", "fill")
  )

  layer(
    stat = loess2d, data = data, mapping = mapping, geom = geom, 
    position = position, show.legend = show.legend, inherit.aes = inherit.aes,
    params = list(na.rm = na.rm, ...)
  )
}

ggplot(mpg, aes(x=displ, y=hwy, fill=year)) + 
  geom_point(aes(color=year)) + 
  stat_loess2d()

ggplot(mpg[mpg$class == "compact",], aes(x=displ, y=hwy, fill=year)) + 
  geom_point(aes(color=year)) + 
  stat_loess2d()

但是，当我尝试对上面的内容进行分面时，不再显示图块：

ggplot(mpg, aes(x=displ, y=hwy, fill=year)) + 
  geom_point(aes(color=year)) + 
  stat_loess2d() +
  facet_wrap(~ class)

有人能告诉我我做错了什么吗？

Answer 1

说明

我在这里看到的主要问题实际上超出了您所做的范围，并且与 geom_tile 如何在特定 x / y 轴值显着不同时跨不同方面处理图块创建有关。 older question 展示了类似的现象：geom_tile 可以很好地处理每个方面的数据，但将它们放在一起，瓷砖会缩小以匹配不同方面的值之间的最小差异。这会在绘图层中留下大量白色 space，并且通常随着每个额外的面逐渐变得更糟，直到瓷砖本身变得几乎不可见。

为了解决这个问题，我会在每个方面的密度/黄土计算后添加一个数据处理步骤，以标准化所有方面的 x 和 y 值的范围。

如果您不太熟悉 compute_layer、compute_panel 和 compute_group 之间的关系，请进行一些详细说明（当我开始摆弄 ggproto 对象时，我当然不是...):

本质上，所有 Stat* 对象都具有这三个功能来弥合给定数据帧（在本例中为 mpg）与 [=19 接收到的数据之间的差距=] 事情的一面。

在这三个函数中，compute_layer 是顶级函数，通常会触发 compute_panel 为每个面/面板计算单独的数据框（导出中使用的术语functions 是 facet，但底层包代码与面板相同；我也不确定为什么）。反过来，compute_panel 触发 compute_group 为每个组计算一个单独的数据框（由 group / colour / fill / 等美学参数定义） .

compute_group 的结果返回到 compute_panel 并组合成一个数据帧。同样，compute_layer 从每个方面的 compute_panel 接收一个数据帧，并将它们再次组合在一起。然后将合并的数据帧传递给 Geom* 进行绘制。

（以上是在顶级 Stat 中定义的通用设置。从 Stat 继承的其他 Stat* 对象可能会覆盖任何步骤。例如，StatIdentity 的 compute_layer returns 原样的原始数据帧，根本不会触发 compute_panel / compute_group，因为那里对于未更改的数据不需要这样做。）

对于此用例，我们可以修改 compute_layer 中的代码，在 compute_panel / compute_group 返回结果并组合在一起后，插入与每个相关联的值小平面进入公共垃圾箱。因为普通垃圾箱 = 漂亮的大瓷砖，中间没有白色 space。

修改

下面是我编写 loess2d ggproto 对象的方式，以及 compute_layer 的附加定义：

loess2d = ggproto("loess2d", Stat, compute_group = function(data, scales) { dens = MASS::kde2d(data$x, data$y) lsurf = loess(fill ~ x + y, data=data) df = data.frame(x = rep(dens$x, length(dens$y)), y = rep(dens$y, each=length(dens$x)), dens = c(dens$z)) df$fill = predict(lsurf, newdata=df[c("x", "y")]) df }, compute_layer = function(self, data, params, layout) { # no change from Stat$compute_layer in this chunk, except # for liberal usage of `ggplot2:::` to utilise un-exported # functions from the package ggplot2:::check_required_aesthetics(self$required_aes, c(names(data), names(params)), ggplot2:::snake_class(self)) data <- remove_missing(data, params$na.rm, c(self$required_aes, self$non_missing_aes), ggplot2:::snake_class(self), finite = TRUE) params <- params[intersect(names(params), self$parameters())] args <- c(list(data = quote(data), scales = quote(scales)), params) df <- plyr::ddply(data, "PANEL", function(data) { scales <- layout$get_scales(data$PANEL[1]) tryCatch(do.call(self$compute_panel, args), error = function(e) { warning("Computation failed in `", ggplot2:::snake_class(self), "()`:\n", e$message, call. = FALSE) data.frame() }) }) # define common x/y grid range across all panels # (length = 25 chosen to match the default value for n in MASS::kde2d) x.range <- seq(min(df$x), max(df$x), length = 25) y.range <- seq(min(df$y), max(df$y), length = 25) # interpolate each panel's data to a common grid, # with NA values for regions where each panel doesn't # have data (this can be changed via the extrap # parameter in akima::interp, but I think # extrapolating may create misleading visuals) df <- df %>% tidyr::nest(-PANEL) %>% mutate(data = purrr::map(data, ~akima::interp(x = .x$x, y = .x$y, z = .x$fill, xo = x.range, yo = y.range, nx = 25, ny = 25) %>% akima::interp2xyz(data.frame = TRUE) %>% rename(fill = z))) %>% tidyr::unnest() return(df) }, required_aes = c("x", "y", "fill") )

用法：

ggplot(mpg, aes(x=displ, y=hwy, fill=year)) + stat_loess2d() + facet_wrap(~ class) # this does trigger warnings (not errors) because some of the facets contain # really very few observations. if we filter for facets with more rows of data # in the original dataset, this wouldn't be an issue ggplot(mpg %>% filter(!class %in% c("2seater", "minivan")), aes(x=displ, y=hwy, fill=year)) + stat_loess2d() + facet_wrap(~ class) # no warnings triggered

ggplot2 自定义统计数据在分面时未显示

ggplot2 custom stat not shown when facetting

r

ggplot2

ggproto

说明

修改