使用 R 将多个系列的数据绘制到单个 bagplot 中

Question

让我们考虑一下 bagplot example，它包含在 R 的 aplpack 库中。袋状图是箱线图的双变量概括，因此可以深入了解数据点在两个轴上的分布。

bagplot 示例：

示例代码：

  # example of Rousseeuw et al., see R-package rpart
  cardata <- structure(as.integer( c(2560,2345,1845,2260,2440,
   2285, 2275, 2350, 2295, 1900, 2390, 2075, 2330, 3320, 2885,
   3310, 2695, 2170, 2710, 2775, 2840, 2485, 2670, 2640, 2655,
   3065, 2750, 2920, 2780, 2745, 3110, 2920, 2645, 2575, 2935,
   2920, 2985, 3265, 2880, 2975, 3450, 3145, 3190, 3610, 2885,
   3480, 3200, 2765, 3220, 3480, 3325, 3855, 3850, 3195, 3735,
   3665, 3735, 3415, 3185, 3690, 97, 114, 81, 91, 113, 97, 97,
   98, 109, 73, 97, 89, 109, 305, 153, 302, 133, 97, 125, 146,
   107, 109, 121, 151, 133, 181, 141, 132, 133, 122, 181, 146,
   151, 116, 135, 122, 141, 163, 151, 153, 202, 180, 182, 232,
   143, 180, 180, 151, 189, 180, 231, 305, 302, 151, 202, 182,
   181, 143, 146, 146)), .Dim = as.integer(c(60, 2)), 
   .Dimnames = list(NULL, c("Weight", "Disp.")))
  bagplot(cardata,factor=3,show.baghull=TRUE,
    show.loophull=TRUE,precision=1,dkmethod=2)
  title("car data Chambers/Hastie 1992")
  # points of y=x*x
  bagplot(x=1:30,y=(1:30)^2,verbose=FALSE,dkmethod=2)

aplpack 的 bagplot 似乎只支持为单个数据系列绘制 "bag"。更有趣的是在单个 bagplot 中绘制两个（或三个）数据系列，其中直观地比较数据系列的 "bags" 可以深入了解数据系列的数据分布差异。有谁知道这是否可以（如果可以，如何）在 R 中完成？

Answer 1

如果我们修改一些 aplpack::bagplot 代码，我们可以为 ggplot2 创建一个新的 geom。然后我们可以用通常的 ggplot2 方式比较数据集中的组。这是一个例子：

library(ggplot2)
ggplot(iris, aes(Sepal.Length, Sepal.Width, 
                 colour = Species, fill = Species)) +
       geom_bag() +
       theme_minimal()

我们可以用 bagplot 显示点：

ggplot(iris, aes(Sepal.Length, Sepal.Width, 
                     colour = Species, fill = Species)) +
           geom_bag() +
           geom_point() + 
           theme_minimal()

这是 geom_bag 和修改后的 aplpack::bagplot 函数的代码：https://gist.github.com/benmarwick/00772ccea2dd0b0f1745

使用 R 将多个系列的数据绘制到单个 bagplot 中

Plot multiple series of data into a single bagplot with R

visualization

r

boxplot

bagplot