用计数标记 ggplot2 中的密度图
Labeling a density plot in ggplot2 with counts
问题
如何添加显示观测值数量的标签沿着密度图?
数据
我的数据集:
mwe <- structure(list(Gender = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), .Label = c("Female", "Male"), class = "factor"),
Age = c(23, 23, 23, 23, 23, 23, 39, 39, 39, 39, 39, 39, 30,
30, 30, 30, 30, 30, 30, 30, 24, 24, 24, 24, 24, 24, 24, 24,
18, 18, 18, 18, 18, 18, 23, 23, 23, 23, 23, 23, 23, 23, 26,
26, 26, 26, 26, 26, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23,
23, 23, 23, 23, 30, 30, 30, 30, 30, 30, 20, 20, 20, 20, 20,
20, 25, 25, 25, 25, 25, 25, 25, 25, 23, 23, 23, 23, 23, 23,
23, 23, 38, 38, 38, 38, 38, 38, 22, 22, 22, 22, 22, 22, 29,
29, 29, 29, 29, 29, 21, 21, 21, 21, 21, 21, 23, 23, 23, 23,
23, 23, 25, 25, 25, 25, 25, 25, 24, 24, 24, 24, 24, 24, 21,
21, 21, 21, 21, 21, 27, 27, 27, 27, 27, 27, 24, 24, 24, 24,
24, 24, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 23,
23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 21, 21,
21, 21, 27, 27, 27, 27, 27, 27, 34, 34, 34, 34, 34, 34, 26,
26, 26, 26, 26, 26, 26, 26, 28, 28, 28, 28, 28, 28, 39, 39,
39, 39, 39, 39, 26, 26, 26, 26, 26, 26), KmEuc = structure(c(1L,
1L, 1L, 1L, 3L, 3L, 2L, 2L, 3L, 3L, 2L, 3L, 2L, 2L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L,
1L, 1L, 3L, 2L, 1L, 1L, 1L, 1L, 3L, 2L, 3L, 3L, 3L, 2L, 3L,
2L, 2L, 2L, 3L, 2L, 3L, 2L, 2L, 2L, 3L, 3L, 2L, 3L, 2L, 2L,
3L, 2L, 3L, 3L, 2L, 3L, 2L, 2L, 3L, 3L, 2L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L,
3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 3L, 3L, 3L, 3L, 3L, 2L,
2L, 3L, 3L, 2L, 2L, 2L, 2L, 3L, 3L, 2L, 2L, 2L, 3L, 3L, 3L,
2L, 3L, 2L, 2L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 3L, 2L,
2L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 2L, 2L, 2L, 2L, 3L, 3L,
3L, 2L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 2L, 2L, 2L, 3L, 3L, 2L,
2L, 3L, 3L, 2L, 2L, 3L, 2L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L,
3L, 2L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 2L, 3L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 3L, 3L, 3L), .Label = c("1", "2", "3"), class = "factor")), class = "data.frame", row.names = c(NA,
-218L))
我想使用密度图显示年龄分布:
代码
p1 <- ggplot() +
geom_freqpoly(aes(x = Age, color = KmEuc), stat = 'density', position = 'dodge', data=mwe) +
scale_color_manual(guide = guide_legend(),name = 'Clusters',values = c("#E31A1C","#332288", "#66A61E"), labels = c("Pie", "Carrot", "Rice")) +
theme_light(base_size=14) +
facet_grid(facets = Gender ~ .) +
theme(axis.title.x = element_blank(),axis.title.y = element_blank())
试用
为了添加计数标签,我尝试了以下方法:
dfLabels <- mwe %>%
select(c(Age, Gender, KmEuc)) %>%
group_by(Age, Gender, KmEuc) %>%
dplyr::summarise(N = n())
p1 + geom_label(data = dfLabels, aes(x = Age, y = 0.01, label = N), size = 3, vjust = 0, hjust = 0)
由于y=0.01
我只能在y轴的固定线上显示N
,在这种情况下如何使N
出现在密度函数上?
试试这个。除了计算计数之外,我还计算了每个年龄段的密度。我借鉴了 的总体思路,但根据您的问题进行了调整,并使用了 tidyverse
方法。
library(ggplot2)
library(purrr)
library(dplyr)
library(tidyr)
dfLabels <- mwe %>%
select(Age, Gender, KmEuc) %>%
group_by(Gender, KmEuc) %>%
nest() %>%
# Compute density
mutate(dens = purrr::map(data, ~ density(.$Age))) %>%
# Unique Ages
mutate(age_uniq = purrr::map(data, ~ unique(.$Age))) %>%
unnest(age_uniq)
dfLabels1 <- dfLabels %>%
# Compute "y" by interpolation and count
mutate(label.y = purrr::map2_dbl(age_uniq, dens, ~approx(.y$x, .y$y, .x)$y),
label.n = purrr::map2_dbl(age_uniq, data, ~ sum(.y$Age == .x))) %>%
select(Gender, KmEuc, Age = age_uniq, label.y, label.n)
p1 <- ggplot() +
geom_freqpoly(aes(x = Age, color = KmEuc), stat = 'density', position = 'dodge', data=mwe) +
geom_text(aes(x = Age, y = label.y, color = KmEuc, label = label.n),
position = 'dodge', vjust = 0, show.legend = FALSE, data=dfLabels1) +
scale_color_manual(guide = guide_legend(),name = 'Clusters',values = c("#E31A1C","#332288", "#66A61E"), labels = c("Pie", "Carrot", "Rice")) +
theme_light(base_size=14) +
facet_grid(facets = Gender ~ .) +
theme(axis.title.x = element_blank(),axis.title.y = element_blank())
p1
#> Warning: Width not defined. Set with `position_dodge(width = ?)`
#> Warning: Width not defined. Set with `position_dodge(width = ?)`
由 reprex package (v0.3.0)
于 2020-04-11 创建
问题
如何添加显示观测值数量的标签沿着密度图?
数据
我的数据集:
mwe <- structure(list(Gender = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), .Label = c("Female", "Male"), class = "factor"),
Age = c(23, 23, 23, 23, 23, 23, 39, 39, 39, 39, 39, 39, 30,
30, 30, 30, 30, 30, 30, 30, 24, 24, 24, 24, 24, 24, 24, 24,
18, 18, 18, 18, 18, 18, 23, 23, 23, 23, 23, 23, 23, 23, 26,
26, 26, 26, 26, 26, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23,
23, 23, 23, 23, 30, 30, 30, 30, 30, 30, 20, 20, 20, 20, 20,
20, 25, 25, 25, 25, 25, 25, 25, 25, 23, 23, 23, 23, 23, 23,
23, 23, 38, 38, 38, 38, 38, 38, 22, 22, 22, 22, 22, 22, 29,
29, 29, 29, 29, 29, 21, 21, 21, 21, 21, 21, 23, 23, 23, 23,
23, 23, 25, 25, 25, 25, 25, 25, 24, 24, 24, 24, 24, 24, 21,
21, 21, 21, 21, 21, 27, 27, 27, 27, 27, 27, 24, 24, 24, 24,
24, 24, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 23,
23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 21, 21,
21, 21, 27, 27, 27, 27, 27, 27, 34, 34, 34, 34, 34, 34, 26,
26, 26, 26, 26, 26, 26, 26, 28, 28, 28, 28, 28, 28, 39, 39,
39, 39, 39, 39, 26, 26, 26, 26, 26, 26), KmEuc = structure(c(1L,
1L, 1L, 1L, 3L, 3L, 2L, 2L, 3L, 3L, 2L, 3L, 2L, 2L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L,
1L, 1L, 3L, 2L, 1L, 1L, 1L, 1L, 3L, 2L, 3L, 3L, 3L, 2L, 3L,
2L, 2L, 2L, 3L, 2L, 3L, 2L, 2L, 2L, 3L, 3L, 2L, 3L, 2L, 2L,
3L, 2L, 3L, 3L, 2L, 3L, 2L, 2L, 3L, 3L, 2L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L,
3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 3L, 3L, 3L, 3L, 3L, 2L,
2L, 3L, 3L, 2L, 2L, 2L, 2L, 3L, 3L, 2L, 2L, 2L, 3L, 3L, 3L,
2L, 3L, 2L, 2L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 3L, 2L,
2L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 2L, 2L, 2L, 2L, 3L, 3L,
3L, 2L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 2L, 2L, 2L, 3L, 3L, 2L,
2L, 3L, 3L, 2L, 2L, 3L, 2L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L,
3L, 2L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 2L, 3L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 3L, 3L, 3L), .Label = c("1", "2", "3"), class = "factor")), class = "data.frame", row.names = c(NA,
-218L))
我想使用密度图显示年龄分布:
代码
p1 <- ggplot() +
geom_freqpoly(aes(x = Age, color = KmEuc), stat = 'density', position = 'dodge', data=mwe) +
scale_color_manual(guide = guide_legend(),name = 'Clusters',values = c("#E31A1C","#332288", "#66A61E"), labels = c("Pie", "Carrot", "Rice")) +
theme_light(base_size=14) +
facet_grid(facets = Gender ~ .) +
theme(axis.title.x = element_blank(),axis.title.y = element_blank())
试用
为了添加计数标签,我尝试了以下方法:
dfLabels <- mwe %>%
select(c(Age, Gender, KmEuc)) %>%
group_by(Age, Gender, KmEuc) %>%
dplyr::summarise(N = n())
p1 + geom_label(data = dfLabels, aes(x = Age, y = 0.01, label = N), size = 3, vjust = 0, hjust = 0)
由于y=0.01
我只能在y轴的固定线上显示N
,在这种情况下如何使N
出现在密度函数上?
试试这个。除了计算计数之外,我还计算了每个年龄段的密度。我借鉴了 tidyverse
方法。
library(ggplot2)
library(purrr)
library(dplyr)
library(tidyr)
dfLabels <- mwe %>%
select(Age, Gender, KmEuc) %>%
group_by(Gender, KmEuc) %>%
nest() %>%
# Compute density
mutate(dens = purrr::map(data, ~ density(.$Age))) %>%
# Unique Ages
mutate(age_uniq = purrr::map(data, ~ unique(.$Age))) %>%
unnest(age_uniq)
dfLabels1 <- dfLabels %>%
# Compute "y" by interpolation and count
mutate(label.y = purrr::map2_dbl(age_uniq, dens, ~approx(.y$x, .y$y, .x)$y),
label.n = purrr::map2_dbl(age_uniq, data, ~ sum(.y$Age == .x))) %>%
select(Gender, KmEuc, Age = age_uniq, label.y, label.n)
p1 <- ggplot() +
geom_freqpoly(aes(x = Age, color = KmEuc), stat = 'density', position = 'dodge', data=mwe) +
geom_text(aes(x = Age, y = label.y, color = KmEuc, label = label.n),
position = 'dodge', vjust = 0, show.legend = FALSE, data=dfLabels1) +
scale_color_manual(guide = guide_legend(),name = 'Clusters',values = c("#E31A1C","#332288", "#66A61E"), labels = c("Pie", "Carrot", "Rice")) +
theme_light(base_size=14) +
facet_grid(facets = Gender ~ .) +
theme(axis.title.x = element_blank(),axis.title.y = element_blank())
p1
#> Warning: Width not defined. Set with `position_dodge(width = ?)`
#> Warning: Width not defined. Set with `position_dodge(width = ?)`
由 reprex package (v0.3.0)
于 2020-04-11 创建