将绘图插图(子图)和多个标准偏差添加到 ggplot() 并根据 R 中的条件进行过滤

add plot inset (subplot) and multiple standard deviations to ggplot() with filtering based on conditions in R

第 1 部分: 我们正在尝试制作一个 ggplot() 以根据 x 轴显示多个不同的摘要统计信息。 我们或多或少地做到了这一点,但愿意接受有助于提高效率的建议。

  1. 对于 Time 0:误差线在 6 g 到 20 g 之间
  2. 对于 Time 1 到 8:平均值 +/- 1 标准差 (SD)
  3. 对于 Time 9 到 12:平均值 +/- 2 SD
  4. 对于 Time >13:没有误差线

第 2 部分: 我们实际处理的数据有 Times 多达 3000。因此,我们想包括一个插图(subplot) 在 Time>=6, Time<=10.

图的左上角

第 3 部分: 除了上述两件事之外,我们还想删除误差线外的所有数据点(以便我们可以显示“之前”(包含所有点,包括误差线边界之外的点)和“之后”图(仅点在误差线边界内直到 Time==12))。

出于可重现性的目的,我使用 R 中的数据集来说明我的问题。数据集:

library(datasets)
data(ChickWeight) #importing data from base R
summary(ChickWeight)

 weight           Time           Chick         Diet   
 Min.   : 35.0   Min.   : 0.00   13     : 12   1:220  
 1st Qu.: 63.0   1st Qu.: 4.00   9      : 12   2:120  
 Median :103.0   Median :10.00   20     : 12   3:120  
 Mean   :121.8   Mean   :10.72   10     : 12   4:118  
 3rd Qu.:163.8   3rd Qu.:16.00   17     : 12          
 Max.   :373.0   Max.   :21.00   19     : 12          
                                 (Other):506 

我已经能够通过首先创建向量(一个用于 geom_errorbar 中的 yminymax)来做到这一点。执行此操作的代码位于问题的底部。 我们乐于接受有关如何更有效地执行此操作的建议。

然后我们尝试将所有这些放在一个 ggplot() 中(排除不需要的格式):

#Import required package: 
library(ggplot2)
    
ggplot(merge_stats, aes(y = weight, x = as.numeric(Time))) +
    geom_jitter(color="grey", width=0.1)+
    geom_errorbar(aes(ymin=SDbelow, ymax=SDabove), width=0.1, size=1)+
    stat_summary(
      geom = "point",
      fun.y = "mean",
      col = "blue",
      size = 2,
      shape = 19,
      fill = "blue")

这会生成:

我们如何向这个 ggplot() 添加一个 subplot() AND 如何然后我们是否继续删除上面设置的标准偏差参数之外的点?

因此,最终产品应该是两张图,一张包含所有数据点的图和子图,另一张没有删除点的图和子图。


为每个年龄组制作不同标准偏差的代码。请注意,我们愿意接受有关简化此操作的建议。

#loading required package
library(dplyr)
library(pracma)

#Creates a table that includes the SD of each age and the mean of each age 
merge_stats <- ChickWeight %>% 
    arrange(Time) %>% 
    group_by(Time) %>%
    mutate(MEAN=mean(weight), SD = sd(weight), SDt=2*sd(weight)) #add to data 

#Vector for Time==0:      
merge_stats_age_zero <- merge_stats %>%  
    filter(Time==0)
vl <-  length(merge_stats_age_zero$weight)
MSZUL=linspace(20, 20, vl) #Vector for top bound 
MSZLL=linspace(6, 6, vl)   #Vector for bottom bound 

#Vector for Time>=1, Time<=8:
mergesaot <- merge_stats %>%                  
    filter(Time>=1, Time<=8)

#vectors for +/- 1 SD for Time>=1, Time<=8:
otoerr = mergesaot$MEAN+mergesaot$SD
otberr = mergesaot$MEAN-mergesaot$SD

#Vector for Time>8, Time<=12:
mergesef <- merge_stats %>%                   
    filter(Time>8) %>%
    filter(Time<=12)

#vectors for +/- 2 SD for Time>8, Time<=12:
efoerr <- mergesef$MEAN+mergesef$SDt
efberr <- mergesef$MEAN-mergesef$SDt

#Combining vectors together:
LSDabove <- c(MSZUL ,otoerr, efoerr)
LSDbelow <- c(MSZLL ,otberr, efberr)

#To generate the final vector we need to first find its length. This is done by subtracting the length of the total by the three added together.
m_swt <- c(merge_stats$SD)
finpeice <- length(m_swt) - length(LSDabove)

#Knowing the length we will generate a vector of zeros to represent no error bars and to cover the remaining length of our errorbar vectors 
finpeiceVec <- linspace(0, 0, finpeice) 

#Finaly we have generated our two vectors to represent our error bars
SDabove <- c(MSZUL ,otoerr, efoerr, finpeiceVec)
SDbelow <- c(MSZLL ,otberr, efberr, finpeiceVec)

这真的是两个问题。关于删除误差线外的点实际上只是在主数据集上创建摘要统计数据后过滤数据。如果您为此苦苦挣扎,那么专门的问题可能会更好。我将在这里展示如何使用 grid 插入一个子图,ggplot2 是在其上构建的:

subset1 <- which(merge_stats$Time >= 6 & merge_stats$Time <= 10)

p1 <- ggplot(merge_stats[subset1, ], 
             aes(y = weight, x = as.numeric(Time))) +
  geom_jitter(color="grey", width=0.1)+
  geom_errorbar(aes(ymin=SDbelow[subset1], ymax=SDabove[subset1]), width=0.1, size=1)+
  stat_summary(
    geom = "point",
    fun.y = "mean",
    col = "blue",
    size = 2,
    shape = 19,
    fill = "blue")

inset <- ggplotGrob(p1)

ggplot(merge_stats, aes(y = weight, x = as.numeric(Time))) +
  geom_jitter(color="grey", width=0.1)+
  geom_errorbar(aes(ymin=SDbelow, ymax=SDabove), width=0.1, size=1)+
  stat_summary(
    geom = "point",
    fun.y = "mean",
    col = "blue",
    size = 2,
    shape = 19,
    fill = "blue")

vp <- grid::viewport(width = 0.4, height = 0.4, x = 0.3, y = 0.7)

print(p1, vp = vp)