R,ggplot,使用抖动时如何将相关点保持在一起?

R, ggplot, How do I keep related points together when using jitter?

我的数据框中的一个变量是表示获得或花费金额的因素。每个事件都有一个“增益”值;可能有也可能没有相应的“支出”金额。这是一张带有过度绘制的观察结果的图像:

添加一些随机抖动有助于视觉效果,但是,“支出”金额与其相应的增益事件脱节:

我希望看到蓝色圆圈在它们的增益圆圈(其中“id”相等)中“瞄准”,并且成对抖动。以下是一些示例数据(三天)和代码:

library(ggplot2)
ccode<-c(Gain="darkseagreen",Spend="darkblue")
ef<-data.frame(
  date=as.Date(c("2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03")),
  site=c("Castle","Temple","Temple","Temple","Temple","Temple","Palace","Palace","Castle","Castle","Castle","Temple","Temple","Palace","Palace","Castle","Castle","Castle","Castle","Castle","Temple","Temple","Palace","Castle","Temple","Temple","Temple","Temple","Temple","Palace","Palace","Castle","Castle","Castle","Temple","Temple","Palace","Palace","Castle","Castle","Castle","Castle","Castle","Temple","Temple","Palace"),
  id=c("C123","T101","T93","T94","T95","T96","P102","P96","C126","C127","C128","T100","T98","P100","P98","C129","C130","C131","C132","C133","T104","T99","P99","C123","T101","T93","T94","T95","T96","P102","P96","C126","C127","C128","T100","T98","P100","P98","C129","C130","C131","C132","C133","T104","T99","P99"),
  gainspend=c("Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend"),
  amount=c(6,14,34,31,3,10,6,14,2,16,16,14,1,1,15,11,8,7,2,10,15,4,3,NA,NA,4,5,NA,NA,NA,NA,NA,NA,2,NA,1,NA,3,NA,NA,2,NA,NA,2,NA,3))
#▼ 3 day, points centered
ggplot(ef,aes(date,site)) + 
  geom_point(aes(size=amount,color=gainspend),alpha=0.5) +
  scale_color_manual(values=ccode) +
  scale_size_continuous(range=c(1,15),breaks=c(5,10,20))
#▼ 3 day, jitted
ggplot(ef,aes(date,site)) + 
  geom_point(aes(size=amount,color=gainspend),alpha=0.5,position=position_jitter(w=0,h=0.2)) +
  scale_color_manual(values=ccode) +
  scale_size_continuous(range=c(1,15),breaks=c(5,10,20))

我的主要想法是旧的“手动添加抖动”方法。我想知道是否有更好的方法可以像绘制小饼图作为点 la package scatterpie.

在这种情况下,您可以为每个 ID 的抖动量添加一个随机数,这样组内的点将移动相同的量。这需要在 ggplot2.

之外进行工作

首先画出为每个ID添加的“抖动”。由于分类轴是 1 个单位宽,我选择 -.3 和 .3 之间的数字。我使用 dplyr 来完成这项工作并设置种子,这样你会得到相同的结果。

library(dplyr)
set.seed(16)
ef2 = ef %>%
    group_by(id) %>%
    mutate(jitter = runif(1, min = -.3, max = .3)) %>%
    ungroup()

接下来是剧情。我使用 geom_blank() 图层,以便在添加抖动之前绘制分类 site 轴。我将 site 从一个因子转换为数字并添加抖动;这仅适用于因子,幸运的是 ggplot2 中的分类轴基于因子。

现在配对的 ID 一起移动。

ggplot(ef2, aes(x = date, y = site)) + 
    geom_blank() +
    geom_point(aes(size = amount, color = gainspend, 
                   y = as.numeric(factor(site)) + jitter),
               alpha=0.5) +
    scale_color_manual(values = ccode) +
    scale_size_continuous(range = c(1, 15), breaks = c(5, 10, 20))
#> Warning: Removed 15 rows containing missing values (geom_point).

reprex package (v2.0.0)

于 2021-09-23 创建

您可以通过 id 在 ggplot() 调用之外添加一些抖动。

jj <- data.frame(id = unique(ef$id), jtr = runif(nrow(ef), -0.3, 0.3))
ef <- merge(ef, jj, by = 'id')
ef$sitej <- as.numeric(factor(ef$site)) + ef$jtr

但是您需要 site integer/numeric 才能执行此操作。所以在做图的时候,需要手动添加坐标轴标签scale_y_continuous()。 (更新:上面 aosmith 的 geom_blank() 技巧是更好的解决方案!)

ggplot(ef,aes(date,sitej)) + 
  geom_point(aes(size=amount,color=gainspend),alpha=0.5) +
  scale_color_manual(values=ccode) +
  scale_size_continuous(range=c(1,15),breaks=c(5,10,20)) +
  scale_y_continuous(breaks = 1:3, labels= sort(unique(ef$site)))

这似乎可行,但仍然有几个 gain/spend 圈子没有合作伙伴 -- 也许 id 变量有问题。

也许其他人有更好的方法!