根据组为直方图着色时防止错误的密度图
Preventing wrong density plots when coloring histograms according to groups
基于一些虚拟数据,我创建了一个带有密度图的直方图
set.seed(1234)
wdata = data.frame(
sex = factor(rep(c("F", "M"), each=200)),
weight = c(rnorm(200, 55), rnorm(200, 58))
)
a <- ggplot(wdata, aes(x = weight))
a + geom_histogram(aes(y = ..density..,
# color = sex
),
colour="black",
fill="white",
position = "identity") +
geom_density(alpha = 0.2,
# aes(color = sex)
) +
scale_color_manual(values = c("#868686FF", "#EFC000FF"))
weight
的直方图应该对应sex
着色,所以我用aes(y = ..density.., color = sex)
来表示geom_histogram()
:
a + geom_histogram(aes(y = ..density..,
color = sex
),
colour="black",
fill="white",
position = "identity") +
geom_density(alpha = 0.2,
# aes(color = sex)
) +
scale_color_manual(values = c("#868686FF", "#EFC000FF"))
如我所愿,密度图保持不变(两组的总体),但直方图按比例放大(现在似乎单独处理):
如何防止这种情况发生?我需要单独着色的直方图条,但需要所有着色组的联合密度图。
P.S。
将 aes(color = sex)
用于 geom_density()
可将所有内容恢复到原始比例 - 但我不想要单独的密度图(如下所示):
a + geom_histogram(aes(y = ..density..,
color = sex
),
colour="black",
fill="white",
position = "identity") +
geom_density(alpha = 0.2,
aes(color = sex)
) +
scale_color_manual(values = c("#868686FF", "#EFC000FF"))
编辑:
正如所建议的那样,用 geom_histogram()
和 y = ..density../2
的美学中的组数除以近似解。然而,这仅适用于对称分布,如下面的第一个输出所示:
a + geom_histogram(aes(y = ..density../2,
color = sex
),
colour="black",
fill="white",
position = "identity") +
geom_density(alpha = 0.2,
) +
scale_color_manual(values = c("#868686FF", "#EFC000FF"))
产生
但是,不太对称的分布可能会导致使用此方法出现问题。请参阅下面的内容,其中 5 个组使用了 y = ..density../5
。先原创,后处理(position = "stack"
):
由于左侧偏重,除以 5 左侧低估,右侧高估。
编辑 2:解决方案
根据 Andrew 的建议,以下(完整)代码解决了问题:
library(ggplot2)
set.seed(1234)
wdata = data.frame(
sex = factor(rep(c("F", "M"), each = 200)),
weight = c(rnorm(200, 55), rnorm(200, 58))
)
binwidth <- 0.25
a <- ggplot(wdata,
aes(x = weight,
# Pass binwidth to aes() so it will be found in
# geom_histogram()'s aes() later
binwidth = binwidth))
# Basic plot w/o colouring according to 'sex'
a + geom_histogram(aes(y = ..density..),
binwidth = binwidth,
colour = "black",
fill = "white",
position = "stack") +
geom_density(alpha = 0.2) +
scale_color_manual(values = c("#868686FF", "#EFC000FF")) +
# Use fixed scale for sake of comparability
scale_x_continuous(limits = c(52, 61)) +
scale_y_continuous(limits = c(0, 0.25))
# Plot w/ colouring according to 'sex'
a + geom_histogram(aes(x = weight,
# binwidth will only be found if passed to
# ggplot()'s aes() (as above)
y = ..count.. / (sum(..count..) * binwidth),
color = sex),
binwidth = binwidth,
fill="white",
position = "stack") +
geom_density(alpha = 0.2) +
scale_color_manual(values = c("#868686FF", "#EFC000FF")) +
# Use fixed scale for sake of comparability
scale_x_continuous(limits = c(52, 61)) +
scale_y_continuous(limits = c(0, 0.25)) +
guides(color = FALSE)
注:
binwidth = binwidth
需要传递给 ggplot()
的 aes()
,否则 geom_histogram()
的 aes()
将找不到预先指定的 binwidth
.此外,还指定了 position = "stack"
,以便两个版本的直方图具有可比性。虚拟数据图和下面更复杂的分布:
已解决 - 感谢您的帮助!
我不认为你可以使用 y=..density..
来做到这一点,但你可以像这样重新创建同样的东西...
binwidth <- 0.25 #easiest to set this manually so that you know what it is
a + geom_histogram(aes(y = ..count.. / (sum(..count..) * binwidth),
color = sex),
binwidth = binwidth,
fill="white",
position = "identity") +
geom_density(alpha = 0.2) +
scale_color_manual(values = c("#868686FF", "#EFC000FF"))
基于一些虚拟数据,我创建了一个带有密度图的直方图
set.seed(1234)
wdata = data.frame(
sex = factor(rep(c("F", "M"), each=200)),
weight = c(rnorm(200, 55), rnorm(200, 58))
)
a <- ggplot(wdata, aes(x = weight))
a + geom_histogram(aes(y = ..density..,
# color = sex
),
colour="black",
fill="white",
position = "identity") +
geom_density(alpha = 0.2,
# aes(color = sex)
) +
scale_color_manual(values = c("#868686FF", "#EFC000FF"))
weight
的直方图应该对应sex
着色,所以我用aes(y = ..density.., color = sex)
来表示geom_histogram()
:
a + geom_histogram(aes(y = ..density..,
color = sex
),
colour="black",
fill="white",
position = "identity") +
geom_density(alpha = 0.2,
# aes(color = sex)
) +
scale_color_manual(values = c("#868686FF", "#EFC000FF"))
如我所愿,密度图保持不变(两组的总体),但直方图按比例放大(现在似乎单独处理):
如何防止这种情况发生?我需要单独着色的直方图条,但需要所有着色组的联合密度图。
P.S。
将 aes(color = sex)
用于 geom_density()
可将所有内容恢复到原始比例 - 但我不想要单独的密度图(如下所示):
a + geom_histogram(aes(y = ..density..,
color = sex
),
colour="black",
fill="white",
position = "identity") +
geom_density(alpha = 0.2,
aes(color = sex)
) +
scale_color_manual(values = c("#868686FF", "#EFC000FF"))
编辑:
正如所建议的那样,用 geom_histogram()
和 y = ..density../2
的美学中的组数除以近似解。然而,这仅适用于对称分布,如下面的第一个输出所示:
a + geom_histogram(aes(y = ..density../2,
color = sex
),
colour="black",
fill="white",
position = "identity") +
geom_density(alpha = 0.2,
) +
scale_color_manual(values = c("#868686FF", "#EFC000FF"))
产生
但是,不太对称的分布可能会导致使用此方法出现问题。请参阅下面的内容,其中 5 个组使用了 y = ..density../5
。先原创,后处理(position = "stack"
):
由于左侧偏重,除以 5 左侧低估,右侧高估。
编辑 2:解决方案
根据 Andrew 的建议,以下(完整)代码解决了问题:
library(ggplot2)
set.seed(1234)
wdata = data.frame(
sex = factor(rep(c("F", "M"), each = 200)),
weight = c(rnorm(200, 55), rnorm(200, 58))
)
binwidth <- 0.25
a <- ggplot(wdata,
aes(x = weight,
# Pass binwidth to aes() so it will be found in
# geom_histogram()'s aes() later
binwidth = binwidth))
# Basic plot w/o colouring according to 'sex'
a + geom_histogram(aes(y = ..density..),
binwidth = binwidth,
colour = "black",
fill = "white",
position = "stack") +
geom_density(alpha = 0.2) +
scale_color_manual(values = c("#868686FF", "#EFC000FF")) +
# Use fixed scale for sake of comparability
scale_x_continuous(limits = c(52, 61)) +
scale_y_continuous(limits = c(0, 0.25))
# Plot w/ colouring according to 'sex'
a + geom_histogram(aes(x = weight,
# binwidth will only be found if passed to
# ggplot()'s aes() (as above)
y = ..count.. / (sum(..count..) * binwidth),
color = sex),
binwidth = binwidth,
fill="white",
position = "stack") +
geom_density(alpha = 0.2) +
scale_color_manual(values = c("#868686FF", "#EFC000FF")) +
# Use fixed scale for sake of comparability
scale_x_continuous(limits = c(52, 61)) +
scale_y_continuous(limits = c(0, 0.25)) +
guides(color = FALSE)
注:
binwidth = binwidth
需要传递给 ggplot()
的 aes()
,否则 geom_histogram()
的 aes()
将找不到预先指定的 binwidth
.此外,还指定了 position = "stack"
,以便两个版本的直方图具有可比性。虚拟数据图和下面更复杂的分布:
已解决 - 感谢您的帮助!
我不认为你可以使用 y=..density..
来做到这一点,但你可以像这样重新创建同样的东西...
binwidth <- 0.25 #easiest to set this manually so that you know what it is
a + geom_histogram(aes(y = ..count.. / (sum(..count..) * binwidth),
color = sex),
binwidth = binwidth,
fill="white",
position = "identity") +
geom_density(alpha = 0.2) +
scale_color_manual(values = c("#868686FF", "#EFC000FF"))