在 ggplot 中,如何从高到低、在面内对条形图进行排序,并且仍然将一个特定的条形图重新定位到选择的位置?

In ggplot, how to order bars from high-to-low, within-facet, and still relocate one specific bar to position of choice?

我想创建一个条形图,其中的条形从高到低排列,并在包装​​到方面时也保留这种排序。幸运的是,包 {tidytext} 有一个函数 reorder_within() 可以做到这一点。但是,我找不到一种方法来应用这种内部从高到低的重新排序 and 手动重新定位特定的柱。

例子

下面我采用了来自 this blog 的代码。
假设我们要计算每十年婴儿的名字。

第 1 步 -- 数据

我们将可视化 top_names,它是 babynames 数据集的一个子集。

library(babynames)
library(dplyr, warn.conflicts = FALSE)

top_names <- 
  babynames %>%
  filter(between(year, 1950, 1990)) %>%
  mutate(decade = (year %/% 10) * 10) %>%
  group_by(decade) %>%
  count(name, wt = n, sort = TRUE) %>%
  ungroup()

top_names
#> # A tibble: 123,205 x 3
#>    decade name         n
#>     <dbl> <chr>    <int>
#>  1   1950 James   846042
#>  2   1950 Michael 839459
#>  3   1960 Michael 836934
#>  4   1950 Robert  832336
#>  5   1950 John    799658
#>  6   1950 David   771242
#>  7   1960 David   736583
#>  8   1960 John    716284
#>  9   1970 Michael 712722
#> 10   1960 James   687905
#> # ... with 123,195 more rows

reprex package (v2.0.0)

于 2021-08-11 创建

第 2 步 -- 为绘图准备数据

library(tidytext)
library(ggplot2)

data_for_plot <- 
  top_names %>%
  group_by(decade) %>%
  top_n(15) %>%
  ungroup() %>%
  mutate(decade = as.factor(decade),
         name = reorder_within(name, n, decade))
#> Selecting by n

data_for_plot
#> # A tibble: 75 x 3
#>    decade name                n
#>    <fct>  <fct>           <int>
#>  1 1950   James___1950   846042
#>  2 1950   Michael___1950 839459
#>  3 1960   Michael___1960 836934
#>  4 1950   Robert___1950  832336
#>  5 1950   John___1950    799658
#>  6 1950   David___1950   771242
#>  7 1960   David___1960   736583
#>  8 1960   John___1960    716284
#>  9 1970   Michael___1970 712722
#> 10 1960   James___1960   687905
#> # ... with 65 more rows

第 3 步 -- 可视化

p_so_far_so_good <- data_for_plot %>%
  ggplot(aes(name, n, fill = decade)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~decade, scales = "free_y") +
  coord_flip() +
  scale_x_reordered() +
  scale_y_continuous(expand = c(0,0))

p_so_far_so_good


到目前为止,一切顺利!

现在,在每个方面(即“十年”),我想将除前 15 名以外的所有条形图汇总在一起。虽然前 15 名应保持原样,但其余部分应归为“其他” .此外,我想以 保留 从高到低的条形顺序的方式形象化这种新分类,但仍然将“其他”条形固定在底部。我已经发布了 ,但是给出的解决方案不适用于方面内情况。

我失败的尝试#1

为每个组创建“其他”类别进行了一些争论。

data_for_plot_with_other <-
  top_names %>%
  group_by(decade) %>%
  arrange(decade, desc(n)) %>%
  mutate(name = ifelse(row_number() < 5, name, "other")) %>%
  group_by(decade, name) %>%
  summarise(across(n, sum)) %>%
  arrange(name == "other", -n, .by_group = TRUE) %>%
  ungroup() %>%
  mutate(decade = as.factor(decade),
         name = reorder_within(name, n, decade))

> data_for_plot_with_other %>% print(n = 20)
## # A tibble: 75 x 3
##    decade name                   n
##    <fct>  <fct>              <int>
##  1 1950   James___1950      846042
##  2 1950   Michael___1950    839459
##  3 1950   Robert___1950     832336
##  4 1950   John___1950       799658
##  5 1950   David___1950      771242
##  6 1950   Mary___1950       627098
##  7 1950   William___1950    592423
##  8 1950   Linda___1950      565481
##  9 1950   Richard___1950    536393
## 10 1950   Patricia___1950   460643
## 11 1950   Thomas___1950     455154
## 12 1950   Susan___1950      438419
## 13 1950   Deborah___1950    431302
## 14 1950   Mark___1950       383076
## 15 1950   other___1950    30863329 ## see how "other" closes the decade = 1950 group
## 16 1960   Michael___1960    836934
## 17 1960   David___1960      736583
## 18 1960   John___1960       716284
## 19 1960   James___1960      687905
## 20 1960   Robert___1960     653556
# ... with 55 more rows

形象化

data_for_plot_with_other %>%
  ggplot(aes(name, n, fill = factor(decade))) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~decade, scales = "free_y") +
  coord_flip() +
  scale_x_reordered() +
  scale_y_continuous(expand = c(0,0))

这不是我想要的

我想要的是如下内容:

我失败的尝试#2

本着 的精神,我也尝试重新调整因子 name。 我构建了一个辅助函数来检测子字符串 other_ 并将其移动到(组的)末尾。

move_to_end <- function(x, match_to_pattern = "other_") {
  which_idx <- grep(pattern = match_to_pattern, x = x)
  c(x[-which_idx], x[which_idx])
}

data_for_plot_with_other %>%
  group_by(decade) %>%
  mutate(across(name, ~fct_relevel(.x, move_to_end))) %>%
  ## then the visualization as before
  ggplot(aes(name, n, fill = factor(decade))) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~decade, scales = "free_y") +
  coord_flip() +
  scale_x_reordered() +
  scale_y_continuous(expand = c(0,0))

仍然是相同的输出。

总之

如何让条形从高到低排序在面 中总是有“其他”条形底部?

我是新手,但你可以吗? Fct_infreq() 来完成这个

降序排序然后使用 summarize() 以确保 "others" 出现在最后更有意义,从那时起,只需按照它们出现的顺序进行因式分解即可:

dat = top_names %>%
      group_by(decade) %>%
      arrange(decade, desc(n)) %>%
      summarize(decade = mean(decade),
            name = c(name[1:15],"others"),
            n = c(n[1:15],sum(n) - sum(n[1:15]))
            ) %>%
      mutate(decade = as.factor(decade),
         name = reorder_within(name, n():1, decade))

ggplot(dat,aes(name, n, fill = factor(decade))) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~decade, scales = "free_y") +
  coord_flip() +
  scale_x_reordered() +
  scale_y_continuous(expand = c(0,0))

我想我会使用:

  • forcats::fct_lump_n() 在每个十年内创建“其他”类别,使用 w 参数按我们计算的频率加权
  • forcats::fct_relevel() 将新创建的“Other___”关卡移动到大整体因素的开头
library(tidyverse)
library(babynames)

top_names <- 
    babynames %>%
    filter(between(year, 1950, 1990)) %>%
    mutate(decade = (year %/% 10) * 10) %>%
    group_by(decade) %>%
    count(name, wt = n, sort = TRUE, name = "total") %>%
    ungroup()


library(tidytext)

data_for_plot <- 
    top_names %>%
    group_by(decade) %>%
    mutate(name = fct_lump_n(name, n = 15, w = total)) %>%
    group_by(decade, name) %>%
    mutate(total = sum(unique(total))) %>%
    ungroup() %>%
    distinct(decade, name, total) %>%
    mutate(decade = as.factor(decade),
           name = reorder_within(name, total, decade),
           name = fct_relevel(name, paste0("Other___", unique(decade))))

data_for_plot %>%
    ggplot(aes(total, name, fill = decade)) +
    geom_col(show.legend = FALSE) +
    facet_wrap(~decade, scales = "free_y") +
    scale_y_reordered() +
    scale_x_continuous(expand = c(0,0))

reprex package (v2.0.1)

创建于 2021-08-12