ggplot2 自定义统计数据在分面时未显示
ggplot2 custom stat not shown when facetting
我正在尝试为 ggplot2
编写自定义 stat_*
,我想在其中使用瓷砖为 2D 黄土表面着色。当我 start from the extension guide 时,我可以像他们那样写一个 stat_chull:
stat_chull = function(mapping = NULL, data = NULL, geom = "polygon",
position = "identity", na.rm = FALSE, show.legend = NA,
inherit.aes = TRUE, ...) {
chull = ggproto("chull", Stat,
compute_group = function(data, scales) {
data[chull(data$x, data$y), , drop = FALSE]
},
required_aes = c("x", "y")
)
layer(
stat = chull, data = data, mapping = mapping, geom = geom,
position = position, show.legend = show.legend, inherit.aes = inherit.aes,
params = list(na.rm = na.rm, ...)
)
}
这对简单调用和 facet 包装都有效:
ggplot(mpg, aes(x=displ, y=hwy)) +
geom_point() +
stat_chull()
# optionally + facet_wrap(~ class)
当我写我的 stat_loess2d
时,我还可以想象所有 classes 或个人 class:
stat_loess2d = function(mapping = NULL, data = NULL, geom = "tile",
position = "identity", na.rm = FALSE, show.legend = NA,
inherit.aes = TRUE, ...) {
loess2d = ggproto("loess2d", Stat,
compute_group = function(data, scales) {
dens = MASS::kde2d(data$x, data$y)
lsurf = loess(fill ~ x + y, data=data)
df = data.frame(x = rep(dens$x, length(dens$y)),
y = rep(dens$y, each=length(dens$x)),
dens = c(dens$z))
df$fill = predict(lsurf, newdata=df[c("x", "y")])
df
},
required_aes = c("x", "y", "fill")
)
layer(
stat = loess2d, data = data, mapping = mapping, geom = geom,
position = position, show.legend = show.legend, inherit.aes = inherit.aes,
params = list(na.rm = na.rm, ...)
)
}
ggplot(mpg, aes(x=displ, y=hwy, fill=year)) +
geom_point(aes(color=year)) +
stat_loess2d()
ggplot(mpg[mpg$class == "compact",], aes(x=displ, y=hwy, fill=year)) +
geom_point(aes(color=year)) +
stat_loess2d()
但是,当我尝试对上面的内容进行分面时,不再显示图块:
ggplot(mpg, aes(x=displ, y=hwy, fill=year)) +
geom_point(aes(color=year)) +
stat_loess2d() +
facet_wrap(~ class)
有人能告诉我我做错了什么吗?
说明
我在这里看到的主要问题实际上超出了您所做的范围,并且与 geom_tile
如何在特定 x / y 轴值显着不同时跨不同方面处理图块创建有关。 older question 展示了类似的现象:geom_tile
可以很好地处理每个方面的数据,但将它们放在一起,瓷砖会缩小以匹配不同方面的值之间的最小差异。这会在绘图层中留下大量白色 space,并且通常随着每个额外的面逐渐变得更糟,直到瓷砖本身变得几乎不可见。
为了解决这个问题,我会在每个方面的密度/黄土计算后添加一个数据处理步骤,以标准化所有方面的 x 和 y 值的范围。
如果您不太熟悉 compute_layer
、compute_panel
和 compute_group
之间的关系,请进行一些详细说明(当我开始摆弄 ggproto 对象时,我当然不是...):
本质上,所有 Stat*
对象都具有这三个功能来弥合给定数据帧(在本例中为 mpg
)与 [=19 接收到的数据之间的差距=] 事情的一面。
在这三个函数中,compute_layer
是顶级函数,通常会触发 compute_panel
为每个面/面板计算单独的数据框(导出中使用的术语functions 是 facet,但底层包代码与面板相同;我也不确定为什么)。反过来,compute_panel
触发 compute_group
为每个组计算一个单独的数据框(由 group
/ colour
/ fill
/ 等美学参数定义) .
compute_group
的结果返回到 compute_panel
并组合成一个数据帧。同样,compute_layer
从每个方面的 compute_panel
接收一个数据帧,并将它们再次组合在一起。然后将合并的数据帧传递给 Geom*
进行绘制。
(以上是在顶级 Stat
中定义的通用设置。从 Stat
继承的其他 Stat*
对象可能会覆盖任何步骤。例如,StatIdentity
的 compute_layer
returns 原样的原始数据帧,根本不会触发 compute_panel
/ compute_group
,因为那里对于未更改的数据不需要这样做。)
对于此用例,我们可以修改 compute_layer
中的代码,在 compute_panel
/ compute_group
返回结果并组合在一起后,插入与每个相关联的值小平面进入公共垃圾箱。因为普通垃圾箱 = 漂亮的大瓷砖,中间没有白色 space。
修改
下面是我编写 loess2d
ggproto 对象的方式,以及 compute_layer
的附加定义:
loess2d = ggproto("loess2d", Stat,
compute_group = function(data, scales) {
dens = MASS::kde2d(data$x, data$y)
lsurf = loess(fill ~ x + y, data=data)
df = data.frame(x = rep(dens$x, length(dens$y)),
y = rep(dens$y, each=length(dens$x)),
dens = c(dens$z))
df$fill = predict(lsurf, newdata=df[c("x", "y")])
df
},
compute_layer = function(self, data, params, layout) {
# no change from Stat$compute_layer in this chunk, except
# for liberal usage of `ggplot2:::` to utilise un-exported
# functions from the package
ggplot2:::check_required_aesthetics(self$required_aes,
c(names(data), names(params)),
ggplot2:::snake_class(self))
data <- remove_missing(data, params$na.rm,
c(self$required_aes, self$non_missing_aes),
ggplot2:::snake_class(self),
finite = TRUE)
params <- params[intersect(names(params), self$parameters())]
args <- c(list(data = quote(data), scales = quote(scales)), params)
df <- plyr::ddply(data, "PANEL", function(data) {
scales <- layout$get_scales(data$PANEL[1])
tryCatch(do.call(self$compute_panel, args),
error = function(e) {
warning("Computation failed in `", ggplot2:::snake_class(self),
"()`:\n", e$message, call. = FALSE)
data.frame()
})
})
# define common x/y grid range across all panels
# (length = 25 chosen to match the default value for n in MASS::kde2d)
x.range <- seq(min(df$x), max(df$x), length = 25)
y.range <- seq(min(df$y), max(df$y), length = 25)
# interpolate each panel's data to a common grid,
# with NA values for regions where each panel doesn't
# have data (this can be changed via the extrap
# parameter in akima::interp, but I think
# extrapolating may create misleading visuals)
df <- df %>%
tidyr::nest(-PANEL) %>%
mutate(data = purrr::map(data,
~akima::interp(x = .x$x, y = .x$y, z = .x$fill,
xo = x.range, yo = y.range,
nx = 25, ny = 25) %>%
akima::interp2xyz(data.frame = TRUE) %>%
rename(fill = z))) %>%
tidyr::unnest()
return(df)
},
required_aes = c("x", "y", "fill")
)
用法:
ggplot(mpg,
aes(x=displ, y=hwy, fill=year)) +
stat_loess2d() +
facet_wrap(~ class)
# this does trigger warnings (not errors) because some of the facets contain
# really very few observations. if we filter for facets with more rows of data
# in the original dataset, this wouldn't be an issue
ggplot(mpg %>% filter(!class %in% c("2seater", "minivan")),
aes(x=displ, y=hwy, fill=year)) +
stat_loess2d() +
facet_wrap(~ class)
# no warnings triggered
我正在尝试为 ggplot2
编写自定义 stat_*
,我想在其中使用瓷砖为 2D 黄土表面着色。当我 start from the extension guide 时,我可以像他们那样写一个 stat_chull:
stat_chull = function(mapping = NULL, data = NULL, geom = "polygon",
position = "identity", na.rm = FALSE, show.legend = NA,
inherit.aes = TRUE, ...) {
chull = ggproto("chull", Stat,
compute_group = function(data, scales) {
data[chull(data$x, data$y), , drop = FALSE]
},
required_aes = c("x", "y")
)
layer(
stat = chull, data = data, mapping = mapping, geom = geom,
position = position, show.legend = show.legend, inherit.aes = inherit.aes,
params = list(na.rm = na.rm, ...)
)
}
这对简单调用和 facet 包装都有效:
ggplot(mpg, aes(x=displ, y=hwy)) +
geom_point() +
stat_chull()
# optionally + facet_wrap(~ class)
当我写我的 stat_loess2d
时,我还可以想象所有 classes 或个人 class:
stat_loess2d = function(mapping = NULL, data = NULL, geom = "tile",
position = "identity", na.rm = FALSE, show.legend = NA,
inherit.aes = TRUE, ...) {
loess2d = ggproto("loess2d", Stat,
compute_group = function(data, scales) {
dens = MASS::kde2d(data$x, data$y)
lsurf = loess(fill ~ x + y, data=data)
df = data.frame(x = rep(dens$x, length(dens$y)),
y = rep(dens$y, each=length(dens$x)),
dens = c(dens$z))
df$fill = predict(lsurf, newdata=df[c("x", "y")])
df
},
required_aes = c("x", "y", "fill")
)
layer(
stat = loess2d, data = data, mapping = mapping, geom = geom,
position = position, show.legend = show.legend, inherit.aes = inherit.aes,
params = list(na.rm = na.rm, ...)
)
}
ggplot(mpg, aes(x=displ, y=hwy, fill=year)) +
geom_point(aes(color=year)) +
stat_loess2d()
ggplot(mpg[mpg$class == "compact",], aes(x=displ, y=hwy, fill=year)) +
geom_point(aes(color=year)) +
stat_loess2d()
但是,当我尝试对上面的内容进行分面时,不再显示图块:
ggplot(mpg, aes(x=displ, y=hwy, fill=year)) +
geom_point(aes(color=year)) +
stat_loess2d() +
facet_wrap(~ class)
有人能告诉我我做错了什么吗?
说明
我在这里看到的主要问题实际上超出了您所做的范围,并且与 geom_tile
如何在特定 x / y 轴值显着不同时跨不同方面处理图块创建有关。 older question 展示了类似的现象:geom_tile
可以很好地处理每个方面的数据,但将它们放在一起,瓷砖会缩小以匹配不同方面的值之间的最小差异。这会在绘图层中留下大量白色 space,并且通常随着每个额外的面逐渐变得更糟,直到瓷砖本身变得几乎不可见。
为了解决这个问题,我会在每个方面的密度/黄土计算后添加一个数据处理步骤,以标准化所有方面的 x 和 y 值的范围。
如果您不太熟悉 compute_layer
、compute_panel
和 compute_group
之间的关系,请进行一些详细说明(当我开始摆弄 ggproto 对象时,我当然不是...):
本质上,所有
Stat*
对象都具有这三个功能来弥合给定数据帧(在本例中为mpg
)与 [=19 接收到的数据之间的差距=] 事情的一面。在这三个函数中,
compute_layer
是顶级函数,通常会触发compute_panel
为每个面/面板计算单独的数据框(导出中使用的术语functions 是 facet,但底层包代码与面板相同;我也不确定为什么)。反过来,compute_panel
触发compute_group
为每个组计算一个单独的数据框(由group
/colour
/fill
/ 等美学参数定义) .compute_group
的结果返回到compute_panel
并组合成一个数据帧。同样,compute_layer
从每个方面的compute_panel
接收一个数据帧,并将它们再次组合在一起。然后将合并的数据帧传递给Geom*
进行绘制。
(以上是在顶级 Stat
中定义的通用设置。从 Stat
继承的其他 Stat*
对象可能会覆盖任何步骤。例如,StatIdentity
的 compute_layer
returns 原样的原始数据帧,根本不会触发 compute_panel
/ compute_group
,因为那里对于未更改的数据不需要这样做。)
对于此用例,我们可以修改 compute_layer
中的代码,在 compute_panel
/ compute_group
返回结果并组合在一起后,插入与每个相关联的值小平面进入公共垃圾箱。因为普通垃圾箱 = 漂亮的大瓷砖,中间没有白色 space。
修改
下面是我编写 loess2d
ggproto 对象的方式,以及 compute_layer
的附加定义:
loess2d = ggproto("loess2d", Stat,
compute_group = function(data, scales) {
dens = MASS::kde2d(data$x, data$y)
lsurf = loess(fill ~ x + y, data=data)
df = data.frame(x = rep(dens$x, length(dens$y)),
y = rep(dens$y, each=length(dens$x)),
dens = c(dens$z))
df$fill = predict(lsurf, newdata=df[c("x", "y")])
df
},
compute_layer = function(self, data, params, layout) {
# no change from Stat$compute_layer in this chunk, except
# for liberal usage of `ggplot2:::` to utilise un-exported
# functions from the package
ggplot2:::check_required_aesthetics(self$required_aes,
c(names(data), names(params)),
ggplot2:::snake_class(self))
data <- remove_missing(data, params$na.rm,
c(self$required_aes, self$non_missing_aes),
ggplot2:::snake_class(self),
finite = TRUE)
params <- params[intersect(names(params), self$parameters())]
args <- c(list(data = quote(data), scales = quote(scales)), params)
df <- plyr::ddply(data, "PANEL", function(data) {
scales <- layout$get_scales(data$PANEL[1])
tryCatch(do.call(self$compute_panel, args),
error = function(e) {
warning("Computation failed in `", ggplot2:::snake_class(self),
"()`:\n", e$message, call. = FALSE)
data.frame()
})
})
# define common x/y grid range across all panels
# (length = 25 chosen to match the default value for n in MASS::kde2d)
x.range <- seq(min(df$x), max(df$x), length = 25)
y.range <- seq(min(df$y), max(df$y), length = 25)
# interpolate each panel's data to a common grid,
# with NA values for regions where each panel doesn't
# have data (this can be changed via the extrap
# parameter in akima::interp, but I think
# extrapolating may create misleading visuals)
df <- df %>%
tidyr::nest(-PANEL) %>%
mutate(data = purrr::map(data,
~akima::interp(x = .x$x, y = .x$y, z = .x$fill,
xo = x.range, yo = y.range,
nx = 25, ny = 25) %>%
akima::interp2xyz(data.frame = TRUE) %>%
rename(fill = z))) %>%
tidyr::unnest()
return(df)
},
required_aes = c("x", "y", "fill")
)
用法:
ggplot(mpg,
aes(x=displ, y=hwy, fill=year)) +
stat_loess2d() +
facet_wrap(~ class)
# this does trigger warnings (not errors) because some of the facets contain
# really very few observations. if we filter for facets with more rows of data
# in the original dataset, this wouldn't be an issue
ggplot(mpg %>% filter(!class %in% c("2seater", "minivan")),
aes(x=displ, y=hwy, fill=year)) +
stat_loess2d() +
facet_wrap(~ class)
# no warnings triggered