使用 ggplot2 对直方图中超出限制的数据进行分组

Question

我正在尝试对部分数据绘制直方图。我的问题是我想将范围之外的所有内容归为最后一个类别“10+”。是否可以使用 ggplot2 来实现？

示例代码：

x <- data.frame(runif(10000, 0, 15))
ggplot(x, aes(runif.10000..0..15.)) + 
  geom_histogram(aes(y =  (..count..)/sum(..count..)), colour = "grey50", binwidth = 1) + 
  scale_y_continuous(labels = percent) +
  coord_cartesian(xlim=c(0, 10)) +
  scale_x_continuous(breaks = 0:10)

这是直方图现在的样子： How the histogram looks now

这是我希望的样子： How the histogram should look

可能可以通过嵌套 ifelses 来做到这一点，但正如我遇到的问题更多情况下，ggplot 有办法做到这一点吗？

Answer 1

您可以使用 forcats 和 dplyr 有效地对值进行分类，聚合最后的 "levels"，然后在绘图之前计算百分比。这样的事情应该有效：

library(forcats)
library(dplyr)
library(ggplot2)

x <- data.frame(x = runif(10000, 0, 15))
x2 <- x %>%
  mutate(x_grp = cut(x, breaks = c(seq(0,15,1)))) %>% 
  mutate(x_grp = fct_collapse(x_grp, other = levels(x_grp)[10:15])) %>% 
  group_by(x_grp) %>% 
  dplyr::summarize(count = n())

ggplot(x2, aes(x = x_grp, y = count/10000)) + 
  geom_bar(stat = "identity", colour = "grey50") + 
  scale_y_continuous(labels = percent)

但是，生成的图形与您的示例有很大不同，但我认为它是正确的，因为我们正在构建均匀分布：

使用 ggplot2 对直方图中超出限制的数据进行分组

Grouping data outside limits in histogram using ggplot2

grouping

r

histogram

zooming

ggplot2