使用 ggplot2 绘制多条正态曲线,无需硬编码均值和标准差
Plotting multiple normal curves with ggplot2 without hardcoding means and standard deviations
我有一个均值和标准差向量,我想使用 ggplot2
在同一图中绘制与这些均值和标准差相对应的密度。我使用 mapply
和 gather
来解决这个问题,但它有很多代码行,我认为应该是微不足道的:
library(dplyr)
library(tidyr)
library(ggplot2)
# generate data
my_data <- data.frame(mean = c(0.032, 0.04, 0.038, 0.113, 0.105, 0.111),
stdev = c(0.009, 0.01, 0.01, 0.005, 0.014, 0.006),
test = factor(c("Case_01", "Case_02", "Case_03", "Case_04",
"Case_05", "Case_06")))
# points at which to evaluate the Gaussian densities
x <- seq(-0.05, 0.2, by = 0.001)
# build list of Gaussian density vectors based on means and standard deviations
pdfs <- mapply(dnorm, mean = my_data$mean, sd = my_data$stdev, MoreArgs = list(x = x),
SIMPLIFY = FALSE)
# add group names
names(pdfs) <- my_data$test
# convert list to dataframe
pdfs <- do.call(cbind.data.frame, pdfs)
pdfs$x <- x
# convert dataframe to tall format
tall_df <- gather(pdfs, test, density, -x)
# build plot
p <- ggplot(tall_df, aes(color = test, x = x, y = density)) +
geom_line() +
geom_segment(data = my_data, aes(color = test, x = mean, y = 0,
xend = mean, yend = 100), linetype = "dashed") +
coord_cartesian(ylim = c(-1, 100))
print(p)
这非常类似于:
Plot multiple normal curves in same plot
事实上,the accepted answer 使用 mapply
,所以这证实了我在正确的轨道上。但是,我不喜欢这个答案的是它在 mapply
调用中硬编码均值和标准差。这在我的用例中不起作用,因为我从磁盘读取了真实数据(当然,在 MRE 中,为简单起见,我跳过了数据读取部分)。是否可以简化我的代码,而不牺牲可读性,并且不在 mapply
调用中对均值和标准差向量进行硬编码?
EDIT 也许可以通过使用包 mvtnorm
来避免对 mapply
的调用,但我认为这不会在这里提供任何真正的简化.我的大部分代码都出现在调用 mapply
.
之后
您可以使用 purrr::pmap_df
节省一些编码,它会在为每个 mean-stdev
对构建数据框后自动进行行绑定:
假设 my_data
的输入列顺序为 mean, stdev, test
并且 test
的字符为 class.
library(purrr)
tall_df2 <- pmap_df(my_data, ~ data_frame(x = x, test = ..3, density = dnorm(x, ..1, ..2)))
有数据:
my_data <- data.frame(mean = c(0.032, 0.04, 0.038, 0.113, 0.105, 0.111),
stdev = c(0.009, 0.01, 0.01, 0.005, 0.014, 0.006),
test = c("Case_01", "Case_02", "Case_03", "Case_04", "Case_05", "Case_06"),
stringsAsFactors = F)
剧情:
p <- ggplot(tall_df2, aes(color = factor(test), x = x, y = density)) +
geom_line() +
geom_segment(data = my_data, aes(color = test, x = mean, y = 0,
xend = mean, yend = 100), linetype = "dashed") +
coord_cartesian(ylim = c(-1, 100))
print(p)
给出:
我有一个均值和标准差向量,我想使用 ggplot2
在同一图中绘制与这些均值和标准差相对应的密度。我使用 mapply
和 gather
来解决这个问题,但它有很多代码行,我认为应该是微不足道的:
library(dplyr)
library(tidyr)
library(ggplot2)
# generate data
my_data <- data.frame(mean = c(0.032, 0.04, 0.038, 0.113, 0.105, 0.111),
stdev = c(0.009, 0.01, 0.01, 0.005, 0.014, 0.006),
test = factor(c("Case_01", "Case_02", "Case_03", "Case_04",
"Case_05", "Case_06")))
# points at which to evaluate the Gaussian densities
x <- seq(-0.05, 0.2, by = 0.001)
# build list of Gaussian density vectors based on means and standard deviations
pdfs <- mapply(dnorm, mean = my_data$mean, sd = my_data$stdev, MoreArgs = list(x = x),
SIMPLIFY = FALSE)
# add group names
names(pdfs) <- my_data$test
# convert list to dataframe
pdfs <- do.call(cbind.data.frame, pdfs)
pdfs$x <- x
# convert dataframe to tall format
tall_df <- gather(pdfs, test, density, -x)
# build plot
p <- ggplot(tall_df, aes(color = test, x = x, y = density)) +
geom_line() +
geom_segment(data = my_data, aes(color = test, x = mean, y = 0,
xend = mean, yend = 100), linetype = "dashed") +
coord_cartesian(ylim = c(-1, 100))
print(p)
Plot multiple normal curves in same plot
事实上,the accepted answer 使用 mapply
,所以这证实了我在正确的轨道上。但是,我不喜欢这个答案的是它在 mapply
调用中硬编码均值和标准差。这在我的用例中不起作用,因为我从磁盘读取了真实数据(当然,在 MRE 中,为简单起见,我跳过了数据读取部分)。是否可以简化我的代码,而不牺牲可读性,并且不在 mapply
调用中对均值和标准差向量进行硬编码?
EDIT 也许可以通过使用包 mvtnorm
来避免对 mapply
的调用,但我认为这不会在这里提供任何真正的简化.我的大部分代码都出现在调用 mapply
.
您可以使用 purrr::pmap_df
节省一些编码,它会在为每个 mean-stdev
对构建数据框后自动进行行绑定:
假设 my_data
的输入列顺序为 mean, stdev, test
并且 test
的字符为 class.
library(purrr)
tall_df2 <- pmap_df(my_data, ~ data_frame(x = x, test = ..3, density = dnorm(x, ..1, ..2)))
有数据:
my_data <- data.frame(mean = c(0.032, 0.04, 0.038, 0.113, 0.105, 0.111),
stdev = c(0.009, 0.01, 0.01, 0.005, 0.014, 0.006),
test = c("Case_01", "Case_02", "Case_03", "Case_04", "Case_05", "Case_06"),
stringsAsFactors = F)
剧情:
p <- ggplot(tall_df2, aes(color = factor(test), x = x, y = density)) +
geom_line() +
geom_segment(data = my_data, aes(color = test, x = mean, y = 0,
xend = mean, yend = 100), linetype = "dashed") +
coord_cartesian(ylim = c(-1, 100))
print(p)
给出: