如何标记 ggridges 包中每个垃圾箱的数量?
How to label the count of each bin within ggridges package?
我有一个模拟 NFL 赛季的数据框,包含 2 列:球队和排名。我正在尝试使用 ggridges 绘制每个团队在 1-10 的每个级别的频率分布图。我可以让绘图工作,但我想显示每个容器中每个 team/rank 的计数。到目前为止我一直没有成功。
ggplot(results,
aes(x=rank, y=team, group = team)) +
geom_density_ridges2(aes(fill=team), stat='binline', binwidth=1, scale = 0.9, draw_baseline=T) +
scale_x_continuous(limits = c(0,11), breaks = seq(1,10,1)) +
theme_ridges() +
theme(legend.position = "none") +
scale_fill_manual(values = c("#4F2E84", "#FB4F14", "#7C1415", "#A71930", "#00143F", "#0C264C", "#192E6C", "#136677", "#203731"), name = NULL)
创建此图的人:
我尝试在这一行中添加以获取添加到每个容器中的计数,但没有成功。
geom_text(stat='bin', aes(y = team + 0.95*stat(count/max(count)),
label = ifelse(stat(count) > 0, stat(count), ""))) +
不是确切的数据集,但这至少足以 运行 原始图:
results = data.frame(team = rep(c('Jets', 'Giants', 'Washington', 'Falcons', 'Bengals', 'Jaguars', 'Texans', 'Cowboys', 'Vikings'), 1000), rank = sample(1:20,9000,replace = T))
如何计算每个 bin 的计数,连接到原始数据并使用新变量 n
作为标签?
library(dplyr) # for count, left_join
results %>%
count(team, rank) %>%
left_join(results) %>%
ggplot(aes(rank, team, group = team)) +
geom_density_ridges2(aes(fill = team),
stat = 'binline',
binwidth = 1,
scale = 0.9,
draw_baseline = TRUE) +
scale_x_continuous(limits = c(0, 11),
breaks = seq(1, 10, 1)) +
theme_ridges() +
theme(legend.position = "none") +
scale_fill_manual(values = c("#4F2E84", "#FB4F14", "#7C1415", "#A71930", "#00143F",
"#0C264C", "#192E6C", "#136677", "#203731"), name = NULL) +
geom_text(aes(label = n),
color = "white",
nudge_y = 0.2)
结果:
Neilfws 的回答很好,但我总是发现 geom_ridgeline
s 在这种情况下很难处理,所以我通常用 geom_rect
:
重新创建它们
library(dplyr)
results %>%
count(team, rank) %>%
filter(rank<=10) %>%
mutate(team=factor(team)) %>%
ggplot() +
geom_rect(aes(xmin=rank-0.5, xmax=rank+0.5, ymin=team, fill=team,
ymax=as.numeric(team)+n*0.75/max(n))) +
geom_text(aes(x=rank, y=as.numeric(team)-0.1, label=n)) +
theme_ridges() +
theme(legend.position = "none") +
scale_fill_manual(values = c("#4F2E84", "#FB4F14", "#7C1415", "#A71930",
"#00143F", "#0C264C", "#192E6C", "#136677",
"#203731"), name = NULL) +
ylab("team")
我特别喜欢 geom_rect
的精细控制水平,而不是脊线。但是你确实失去了围绕每条脊线绘制的漂亮边界线,所以如果这很重要,那就选择另一个答案。
我有一个模拟 NFL 赛季的数据框,包含 2 列:球队和排名。我正在尝试使用 ggridges 绘制每个团队在 1-10 的每个级别的频率分布图。我可以让绘图工作,但我想显示每个容器中每个 team/rank 的计数。到目前为止我一直没有成功。
ggplot(results,
aes(x=rank, y=team, group = team)) +
geom_density_ridges2(aes(fill=team), stat='binline', binwidth=1, scale = 0.9, draw_baseline=T) +
scale_x_continuous(limits = c(0,11), breaks = seq(1,10,1)) +
theme_ridges() +
theme(legend.position = "none") +
scale_fill_manual(values = c("#4F2E84", "#FB4F14", "#7C1415", "#A71930", "#00143F", "#0C264C", "#192E6C", "#136677", "#203731"), name = NULL)
创建此图的人:
我尝试在这一行中添加以获取添加到每个容器中的计数,但没有成功。
geom_text(stat='bin', aes(y = team + 0.95*stat(count/max(count)),
label = ifelse(stat(count) > 0, stat(count), ""))) +
不是确切的数据集,但这至少足以 运行 原始图:
results = data.frame(team = rep(c('Jets', 'Giants', 'Washington', 'Falcons', 'Bengals', 'Jaguars', 'Texans', 'Cowboys', 'Vikings'), 1000), rank = sample(1:20,9000,replace = T))
如何计算每个 bin 的计数,连接到原始数据并使用新变量 n
作为标签?
library(dplyr) # for count, left_join
results %>%
count(team, rank) %>%
left_join(results) %>%
ggplot(aes(rank, team, group = team)) +
geom_density_ridges2(aes(fill = team),
stat = 'binline',
binwidth = 1,
scale = 0.9,
draw_baseline = TRUE) +
scale_x_continuous(limits = c(0, 11),
breaks = seq(1, 10, 1)) +
theme_ridges() +
theme(legend.position = "none") +
scale_fill_manual(values = c("#4F2E84", "#FB4F14", "#7C1415", "#A71930", "#00143F",
"#0C264C", "#192E6C", "#136677", "#203731"), name = NULL) +
geom_text(aes(label = n),
color = "white",
nudge_y = 0.2)
结果:
Neilfws 的回答很好,但我总是发现 geom_ridgeline
s 在这种情况下很难处理,所以我通常用 geom_rect
:
library(dplyr)
results %>%
count(team, rank) %>%
filter(rank<=10) %>%
mutate(team=factor(team)) %>%
ggplot() +
geom_rect(aes(xmin=rank-0.5, xmax=rank+0.5, ymin=team, fill=team,
ymax=as.numeric(team)+n*0.75/max(n))) +
geom_text(aes(x=rank, y=as.numeric(team)-0.1, label=n)) +
theme_ridges() +
theme(legend.position = "none") +
scale_fill_manual(values = c("#4F2E84", "#FB4F14", "#7C1415", "#A71930",
"#00143F", "#0C264C", "#192E6C", "#136677",
"#203731"), name = NULL) +
ylab("team")
我特别喜欢 geom_rect
的精细控制水平,而不是脊线。但是你确实失去了围绕每条脊线绘制的漂亮边界线,所以如果这很重要,那就选择另一个答案。