在直方图中说明标准偏差
Illustrate standard deviation in histogram
考虑以下简单示例:
# E. Musk in Grunheide
set.seed(22032022)
# generate random numbers
randomNumbers <- rnorm(n = 1000, mean = 10, sd = 10)
# empirical sd
sd(randomNumbers)
#> [1] 10.34369
# histogram
hist(randomNumbers, probability = TRUE, main = "", breaks = 50)
# just for illusatration purpose
###
# empirical density
lines(density(randomNumbers), col = 'black', lwd = 2)
# theortical density
curve(dnorm(x, mean = 10, sd = 10), col = "blue", lwd = 2, add = TRUE)
###
由 reprex package (v2.0.1)
创建于 2022-03-22
问题:
有没有一种很好的方法可以通过颜色说明直方图中的经验标准偏差(sd)?
例如。用不同的颜色表示内部条形,或在 x 轴上用区间表示 sd 的范围,即 [mean +/- sd]?
请注意,如果 ggplot2
提供了一个简单的解决方案,也将不胜感激。
这是一个ggplot
解决方案。先计算mean
和sd
,将值保存在不同的vector中。然后使用 ifelse
语句将值分类为“范围内”和“范围外”,fill
它们具有不同的颜色。
蓝线代表您问题中所述的正态分布,黑线代表我们绘制的直方图的密度图。
library(ggplot2)
set.seed(22032022)
# generate random numbers
randomNumbers <- rnorm(n=1000, mean=10, sd=10)
randomNumbers_mean <- mean(randomNumbers)
randomNumbers_sd <- sd(randomNumbers)
ggplot(data.frame(randomNumbers = randomNumbers), aes(randomNumbers)) +
geom_histogram(aes(
fill = ifelse(
randomNumbers > randomNumbers_mean + randomNumbers_sd |
randomNumbers < randomNumbers_mean - randomNumbers_sd,
"Outside range",
"Within range"
)
),
binwidth = 1, col = "gray") +
geom_density(aes(y = ..count..)) +
stat_function(fun = function(x) dnorm(x, mean = 10, sd = 10) * 1000,
color = "blue") +
labs(fill = "Data")
由 reprex package (v2.0.1)
创建于 2022-03-22
这与 Benson 的回答类似 ggplot
解决方案,除了我们预先计算直方图并使用 geom_col
,这样我们就不会在 sd 边界处得到任何不受欢迎的堆叠:
# E. Musk in Grunheide
set.seed(22032022)
# generate random numbers
randomNumbers <- rnorm(n=1000, mean=10, sd=10)
h <- hist(randomNumbers, breaks = 50, plot = FALSE)
lower <- mean(randomNumbers) - sd(randomNumbers)
upper <- mean(randomNumbers) + sd(randomNumbers)
df <- data.frame(x = h$mids, y = h$density,
fill = h$mids > lower & h$mids < upper)
library(ggplot2)
ggplot(df) +
geom_col(aes(x, y, fill = fill), width = 1, color = 'black') +
geom_density(data = data.frame(x = randomNumbers),
aes(x = x, color = 'Actual density'),
key_glyph = 'path') +
geom_function(fun = function(x) {
dnorm(x, mean = mean(randomNumbers), sd = sd(randomNumbers)) },
aes(color = 'theoretical density')) +
scale_fill_manual(values = c(`TRUE` = '#FF374A', 'FALSE' = 'gray'),
name = 'within 1 SD') +
scale_color_manual(values = c('black', 'blue'), name = 'Density lines') +
labs(x = 'Value of random number', y = 'Density') +
theme_minimal()
data.frame(rand = randomNumbers,
cut = {
sd <- sd(randomNumbers)
mn <- mean(randomNumbers)
cut(randomNumbers, c(-Inf, mn -sd, mn +sd, Inf))
}) |>
ggplot(aes(x = rand, fill = cut ) ) +
geom_histogram()
考虑以下简单示例:
# E. Musk in Grunheide
set.seed(22032022)
# generate random numbers
randomNumbers <- rnorm(n = 1000, mean = 10, sd = 10)
# empirical sd
sd(randomNumbers)
#> [1] 10.34369
# histogram
hist(randomNumbers, probability = TRUE, main = "", breaks = 50)
# just for illusatration purpose
###
# empirical density
lines(density(randomNumbers), col = 'black', lwd = 2)
# theortical density
curve(dnorm(x, mean = 10, sd = 10), col = "blue", lwd = 2, add = TRUE)
###
由 reprex package (v2.0.1)
创建于 2022-03-22问题: 有没有一种很好的方法可以通过颜色说明直方图中的经验标准偏差(sd)? 例如。用不同的颜色表示内部条形,或在 x 轴上用区间表示 sd 的范围,即 [mean +/- sd]?
请注意,如果 ggplot2
提供了一个简单的解决方案,也将不胜感激。
这是一个ggplot
解决方案。先计算mean
和sd
,将值保存在不同的vector中。然后使用 ifelse
语句将值分类为“范围内”和“范围外”,fill
它们具有不同的颜色。
蓝线代表您问题中所述的正态分布,黑线代表我们绘制的直方图的密度图。
library(ggplot2)
set.seed(22032022)
# generate random numbers
randomNumbers <- rnorm(n=1000, mean=10, sd=10)
randomNumbers_mean <- mean(randomNumbers)
randomNumbers_sd <- sd(randomNumbers)
ggplot(data.frame(randomNumbers = randomNumbers), aes(randomNumbers)) +
geom_histogram(aes(
fill = ifelse(
randomNumbers > randomNumbers_mean + randomNumbers_sd |
randomNumbers < randomNumbers_mean - randomNumbers_sd,
"Outside range",
"Within range"
)
),
binwidth = 1, col = "gray") +
geom_density(aes(y = ..count..)) +
stat_function(fun = function(x) dnorm(x, mean = 10, sd = 10) * 1000,
color = "blue") +
labs(fill = "Data")
由 reprex package (v2.0.1)
创建于 2022-03-22这与 Benson 的回答类似 ggplot
解决方案,除了我们预先计算直方图并使用 geom_col
,这样我们就不会在 sd 边界处得到任何不受欢迎的堆叠:
# E. Musk in Grunheide
set.seed(22032022)
# generate random numbers
randomNumbers <- rnorm(n=1000, mean=10, sd=10)
h <- hist(randomNumbers, breaks = 50, plot = FALSE)
lower <- mean(randomNumbers) - sd(randomNumbers)
upper <- mean(randomNumbers) + sd(randomNumbers)
df <- data.frame(x = h$mids, y = h$density,
fill = h$mids > lower & h$mids < upper)
library(ggplot2)
ggplot(df) +
geom_col(aes(x, y, fill = fill), width = 1, color = 'black') +
geom_density(data = data.frame(x = randomNumbers),
aes(x = x, color = 'Actual density'),
key_glyph = 'path') +
geom_function(fun = function(x) {
dnorm(x, mean = mean(randomNumbers), sd = sd(randomNumbers)) },
aes(color = 'theoretical density')) +
scale_fill_manual(values = c(`TRUE` = '#FF374A', 'FALSE' = 'gray'),
name = 'within 1 SD') +
scale_color_manual(values = c('black', 'blue'), name = 'Density lines') +
labs(x = 'Value of random number', y = 'Density') +
theme_minimal()
data.frame(rand = randomNumbers,
cut = {
sd <- sd(randomNumbers)
mn <- mean(randomNumbers)
cut(randomNumbers, c(-Inf, mn -sd, mn +sd, Inf))
}) |>
ggplot(aes(x = rand, fill = cut ) ) +
geom_histogram()