用两种不同的颜色填充 stat_ecdf

Question

我正在做 class 假设与贝叶斯模型的对比。我想用 ggplot 做一个精美的图形，用两种不同的颜色显示两个假设区域。

Normal distribution

我想用区域 H0 的不同颜色填充区域 H1。

我的代码是：

#Param of normal distribution
param1 <- 1.74
param2 <- 0.000617

#Normal simulation
sim_posteriori <- data.frame(rnorm(1000, param1, sqrt(param2)), rep('Posteriori', 1000))
names(sim_posteriori) <- c('Datos', 'Grupo')

#Hypotesis contrast
# P(H0) -> mu <= 1.75
pnorm(1.75, param1, sqrt(param2))
# P(H1) -> mu <= 1.75
1 - pnorm(1.75, param1, sqrt(param2))

#Plot
sim_posteriori %>% ggplot(aes(Datos)) +
  stat_ecdf(fill = '#F2C14E95', geom = 'density') +
  geom_vline(aes(xintercept = 1.75), lty = 2, size = 1) +
  labs(title = 'Distribución posteriori y acumulada') +
  xlab('Altura(en metros)') + 
  ylab('Densidad') + 
  theme_minimal() +
  annotate('text', x = 1.735, y = 0.25, label = 'Región H1') +
  annotate('text', x = 1.79, y = 0.25, label = 'Región H0')

Answer 1

如果您发现自己想知道如何让 ggplot 使用其各种 stat_ 函数对您的数据进行复杂的操作，那么您可能以错误的方式解决了您的问题。这些函数的存在是为了便于执行常见的简单转换，但我们需要记住 ggplot 是用于绘图的工具，而不是用于整理数据的工具，因此如果 stat_函数并不是您想要的，通常最好只准备您实际想要绘制的数据，然后绘制它。

在这种情况下，在 ggplot 之外的数据框中创建自己的 ecdf 非常简单，标记它的哪些部分高于和低于您的阈值，然后使用 geom_area 绘制它：

h  <- sort(sim_posteriori$Datos)
df <- data.frame(x = h, y = seq_along(h)/length(h), region = h > 1.75)

ggplot(df, aes(x, y, fill = region)) +
  geom_area() +
  geom_vline(aes(xintercept = 1.75), lty = 2, size = 1) +
  scale_fill_manual(values = c('#F2C14E95', '#C14E4295'), guide = "none") +
  labs(title = 'Distribución posteriori y acumulada',
       x = 'Altura(en metros)', y = 'Densidad') + 
  theme_minimal() +
  annotate('text', x = 1.735, y = 0.25, label = 'Región H1') +
  annotate('text', x = 1.79, y = 0.25, label = 'Región H0')

用两种不同的颜色填充 stat_ecdf

Fill stat_ecdf with two different colors

r

bayesian

ggplot2